Octo-planner: プランナー・アクションエージェント向けのオンデバイス言語モデル

要旨

AIエージェントは、自律的な意思決定と問題解決を可能にすることで、さまざまな領域で重要性を増しています。これらのエージェントが効果的に機能するためには、最適な行動方針を決定し、計画された行動を実行するための計画プロセスが必要です。本論文では、計画と行動実行を2つの独立したコンポーネントに分離した効率的なオンデバイス「Planner-Action」フレームワークを提案します。このフレームワークは、エッジデバイス向けに最適化された38億パラメータのLLMであるPhi-3 Miniを基にしたプランナーエージェントと、関数実行のためのOctopusモデルを使用するアクションエージェントで構成されています。プランナーエージェントは、まずユーザーのクエリに応答し、タスクを一連のサブステップに分解します。その後、アクションエージェントがこれらのサブステップを実行します。リソースが制約されたデバイスでのパフォーマンスを最適化するため、コンテキスト内学習ではなくモデルのファインチューニングを採用し、計算コストとエネルギー消費を削減しながら応答時間を改善します。私たちのアプローチでは、GPT-4を使用して利用可能な関数に基づいた多様な計画クエリと応答を生成し、データ品質を確保するための検証を行います。このキュレーションされたデータセットでPhi-3 Miniモデルをファインチューニングし、ドメイン内テスト環境で97%の成功率を達成しました。複数ドメインの計画課題に対処するため、異なる関数サブセットでトレーニングされたLoRAの重みを統合するマルチLoRAトレーニング手法を開発しました。このアプローチにより、リソースが制約されたデバイス上で計算効率を維持しながら、複雑な複数ドメインクエリを柔軟に処理することが可能になります。さらなる研究を支援するため、モデルの重みをhttps://huggingface.co/NexaAIDev/octopus-planningでオープンソースとして公開しています。デモについては、https://www.nexa4ai.com/octo-plannerを参照してください。

English

AI agents have become increasingly significant in various domains, enabling autonomous decision-making and problem-solving. To function effectively, these agents require a planning process that determines the best course of action and then executes the planned actions. In this paper, we present an efficient on-device Planner-Action framework that separates planning and action execution into two distinct components: a planner agent based on Phi-3 Mini, a 3.8 billion parameter LLM optimized for edge devices, and an action agent using the Octopus model for function execution. The planner agent first responds to user queries by decomposing tasks into a sequence of sub-steps, which are then executed by the action agent. To optimize performance on resource-constrained devices, we employ model fine-tuning instead of in-context learning, reducing computational costs and energy consumption while improving response times. Our approach involves using GPT-4 to generate diverse planning queries and responses based on available functions, with subsequent validations to ensure data quality. We fine-tune the Phi-3 Mini model on this curated dataset, achieving a 97\% success rate in our in-domain test environment. To address multi-domain planning challenges, we developed a multi-LoRA training method that merges weights from LoRAs trained on distinct function subsets. This approach enables flexible handling of complex, multi-domain queries while maintaining computational efficiency on resource-constrained devices. To support further research, we have open-sourced our model weights at https://huggingface.co/NexaAIDev/octopus-planning. For the demo, please refer to https://www.nexa4ai.com/octo-planner.

Octo-planner: プランナー・アクションエージェント向けのオンデバイス言語モデル

Octo-planner: On-device Language Model for Planner-Action Agents

要旨

Support