TPTU-v2: 現実世界システムにおける大規模言語モデルベースエージェントのタスク計画とツール使用能力の強化

要旨

大規模言語モデル（LLM）は、タスク計画と外部ツール（APIなど）の使用を組み合わせる必要があるタスクに対処する能力を示しています。しかし、現実世界の複雑なシステムでは、タスク計画とツール使用に関して以下の3つの主要な課題が存在します：(1) 実際のシステムには通常、膨大な数のAPIが存在するため、トークン長が制限されているLLMのプロンプトにすべてのAPIの説明を入力することは不可能です；(2) 実際のシステムは複雑なタスクを処理するように設計されており、ベースのLLMではそのようなタスクに対する正しいサブタスクの順序やAPI呼び出しの順序を計画することが困難です；(3) 実際のシステムでは、API間の類似した意味論や機能性が、LLMだけでなく人間にとってもそれらを区別する上で課題を生み出します。これに対応するため、本論文では、現実世界のシステムで動作するLLMベースのエージェントのタスク計画とツール使用（TPTU）能力を向上させるための包括的なフレームワークを提案します。このフレームワークは、これらの課題に対処するために設計された3つの主要なコンポーネントで構成されています：(1) APIリトリーバーは、利用可能な膨大なAPIの中からユーザータスクに最も関連性の高いAPIを選択します；(2) LLMファインチューナーは、ベースのLLMを調整し、ファインチューンされたLLMがタスク計画とAPI呼び出しにより適したものになるようにします；(3) デモセレクターは、区別が難しいAPIに関連する異なるデモンストレーションを適応的に取得し、それをインコンテキスト学習に活用して最終的なパフォーマンスを向上させます。私たちは、現実世界の商用システムおよびオープンソースの学術データセットを使用して提案手法を検証し、各コンポーネントおよび統合されたフレームワークの有効性を明確に示しました。

English

Large Language Models (LLMs) have demonstrated proficiency in addressing tasks that necessitate a combination of task planning and the usage of external tools that require a blend of task planning and the utilization of external tools, such as APIs. However, real-world complex systems present three prevalent challenges concerning task planning and tool usage: (1) The real system usually has a vast array of APIs, so it is impossible to feed the descriptions of all APIs to the prompt of LLMs as the token length is limited; (2) the real system is designed for handling complex tasks, and the base LLMs can hardly plan a correct sub-task order and API-calling order for such tasks; (3) Similar semantics and functionalities among APIs in real systems create challenges for both LLMs and even humans in distinguishing between them. In response, this paper introduces a comprehensive framework aimed at enhancing the Task Planning and Tool Usage (TPTU) abilities of LLM-based agents operating within real-world systems. Our framework comprises three key components designed to address these challenges: (1) the API Retriever selects the most pertinent APIs for the user task among the extensive array available; (2) LLM Finetuner tunes a base LLM so that the finetuned LLM can be more capable for task planning and API calling; (3) the Demo Selector adaptively retrieves different demonstrations related to hard-to-distinguish APIs, which is further used for in-context learning to boost the final performance. We validate our methods using a real-world commercial system as well as an open-sourced academic dataset, and the outcomes clearly showcase the efficacy of each individual component as well as the integrated framework.

TPTU-v2: 現実世界システムにおける大規模言語モデルベースエージェントのタスク計画とツール使用能力の強化

TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems

要旨

Support