GPT4Tools: 大規模言語モデルに自己指導を通じてツールの使用を教える

要旨

本論文は、大規模言語モデル（LLMs）がマルチモーダルツールを効率的に利用できるようにすることを目的としています。ChatGPTやGPT-4のような先進的なプロプライエタリLLMsは、高度なプロンプトエンジニアリングを通じてツール利用の大きな可能性を示しています。しかし、これらのモデルは通常、膨大な計算コストと公開されていないデータに依存しています。これらの課題に対処するため、我々は自己指導（self-instruct）に基づくGPT4Toolsを提案し、LLaMAやOPTのようなオープンソースLLMsがツールを利用できるようにします。これは、高度な教師モデルに様々なマルチモーダルコンテキストをプロンプトすることで、指示追従データセットを生成します。Low-Rank Adaptation（LoRA）最適化を使用することで、我々のアプローチはオープンソースLLMsが視覚理解や画像生成を含む様々な視覚問題を解決することを可能にします。さらに、LLMsがツールを利用する能力を評価するためのベンチマークを提供し、これはゼロショットとファインチューニングの両方の方法で行われます。広範な実験により、我々の手法が様々な言語モデルにおいて有効であることが示され、既知のツールを呼び出す精度を大幅に向上させるだけでなく、未知のツールに対するゼロショット能力も可能にします。コードとデモはhttps://github.com/StevenGrove/GPT4Toolsで利用可能です。

English

This paper aims to efficiently enable Large Language Models (LLMs) to use multimodal tools. Advanced proprietary LLMs, such as ChatGPT and GPT-4, have shown great potential for tool usage through sophisticated prompt engineering. Nevertheless, these models typically rely on prohibitive computational costs and publicly inaccessible data. To address these challenges, we propose the GPT4Tools based on self-instruct to enable open-source LLMs, such as LLaMA and OPT, to use tools. It generates an instruction-following dataset by prompting an advanced teacher with various multi-modal contexts. By using the Low-Rank Adaptation (LoRA) optimization, our approach facilitates the open-source LLMs to solve a range of visual problems, including visual comprehension and image generation. Moreover, we provide a benchmark to evaluate the ability of LLMs to use tools, which is performed in both zero-shot and fine-tuning ways. Extensive experiments demonstrate the effectiveness of our method on various language models, which not only significantly improves the accuracy of invoking seen tools, but also enables the zero-shot capacity for unseen tools. The code and demo are available at https://github.com/StevenGrove/GPT4Tools.

GPT4Tools: 大規模言語モデルに自己指導を通じてツールの使用を教える

GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction

要旨

Support