GPT4Tools: 대형 언어 모델에게 자기 주도적 학습을 통해 도구 사용법을 가르치기

초록

본 논문은 대규모 언어 모델(LLMs)이 다중 모드 도구를 효율적으로 사용할 수 있도록 하는 것을 목표로 한다. ChatGPT 및 GPT-4와 같은 고급 전용 LLMs는 정교한 프롬프트 엔지니어링을 통해 도구 사용의 큰 잠재력을 보여주었다. 그러나 이러한 모델들은 일반적으로 과도한 계산 비용과 공개적으로 접근할 수 없는 데이터에 의존한다. 이러한 문제를 해결하기 위해, 우리는 LLaMA 및 OPT와 같은 오픈소스 LLMs가 도구를 사용할 수 있도록 자기 지시(self-instruct) 기반의 GPT4Tools를 제안한다. 이 방법은 다양한 다중 모드 컨텍스트를 통해 고급 교사 모델에 프롬프트를 제공하여 지시-따르기 데이터셋을 생성한다. Low-Rank Adaptation (LoRA) 최적화를 사용함으로써, 우리의 접근 방식은 오픈소스 LLMs가 시각적 이해 및 이미지 생성과 같은 다양한 시각적 문제를 해결할 수 있도록 돕는다. 또한, 우리는 LLMs의 도구 사용 능력을 평가하기 위한 벤치마크를 제공하며, 이는 제로샷 및 미세 조정 방식으로 수행된다. 다양한 언어 모델에 대한 광범위한 실험을 통해 우리의 방법이 기존 도구 호출의 정확도를 크게 향상시킬 뿐만 아니라, 새로운 도구에 대한 제로샷 능력도 가능하게 함을 입증하였다. 코드와 데모는 https://github.com/StevenGrove/GPT4Tools에서 확인할 수 있다.

English

This paper aims to efficiently enable Large Language Models (LLMs) to use multimodal tools. Advanced proprietary LLMs, such as ChatGPT and GPT-4, have shown great potential for tool usage through sophisticated prompt engineering. Nevertheless, these models typically rely on prohibitive computational costs and publicly inaccessible data. To address these challenges, we propose the GPT4Tools based on self-instruct to enable open-source LLMs, such as LLaMA and OPT, to use tools. It generates an instruction-following dataset by prompting an advanced teacher with various multi-modal contexts. By using the Low-Rank Adaptation (LoRA) optimization, our approach facilitates the open-source LLMs to solve a range of visual problems, including visual comprehension and image generation. Moreover, we provide a benchmark to evaluate the ability of LLMs to use tools, which is performed in both zero-shot and fine-tuning ways. Extensive experiments demonstrate the effectiveness of our method on various language models, which not only significantly improves the accuracy of invoking seen tools, but also enables the zero-shot capacity for unseen tools. The code and demo are available at https://github.com/StevenGrove/GPT4Tools.

GPT4Tools: 대형 언어 모델에게 자기 주도적 학습을 통해 도구 사용법을 가르치기

GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction

초록

Support