八爪魚規劃器:用於計劃者-動作代理的設備端語言模型
Octo-planner: On-device Language Model for Planner-Action Agents
June 26, 2024
作者: Wei Chen, Zhiyuan Li, Zhen Guo, Yikang Shen
cs.AI
摘要
人工智慧代理在各個領域變得越來越重要,使得自主決策和問題解決成為可能。為了有效運作,這些代理需要一個規劃過程,該過程確定最佳行動方案,然後執行計劃中的行動。本文提出了一個高效的裝置內規劃-行動框架,將規劃和行動執行分為兩個不同的組件:基於 Phi-3 Mini 的規劃代理,這是一個針對邊緣裝置優化的擁有 38 億參數的 LLM 模型,以及使用 Octopus 模型進行功能執行的行動代理。規劃代理首先通過將任務分解為一系列子步驟來回應用戶查詢,然後由行動代理執行這些子步驟。為了在資源受限的裝置上優化性能,我們採用模型微調而非上下文學習,從而降低計算成本和能源消耗,同時提高響應時間。我們的方法包括使用 GPT-4 根據可用功能生成多樣化的規劃查詢和回應,然後進行後續驗證以確保數據質量。我們在這個經過精心策劃的數據集上對 Phi-3 Mini 模型進行微調,並在我們的領域測試環境中實現了 97% 的成功率。為應對多領域規劃挑戰,我們開發了一種多-LoRA 訓練方法,將在不同功能子集上訓練的 LoRA 的權重合併。這種方法實現了對複雜的多領域查詢的靈活處理,同時在資源受限的裝置上保持計算效率。為了支持進一步的研究,我們已在 https://huggingface.co/NexaAIDev/octopus-planning 上開源我們的模型權重。有關演示,請參閱 https://www.nexa4ai.com/octo-planner。
English
AI agents have become increasingly significant in various domains, enabling
autonomous decision-making and problem-solving. To function effectively, these
agents require a planning process that determines the best course of action and
then executes the planned actions. In this paper, we present an efficient
on-device Planner-Action framework that separates planning and action execution
into two distinct components: a planner agent based on Phi-3 Mini, a 3.8
billion parameter LLM optimized for edge devices, and an action agent using the
Octopus model for function execution. The planner agent first responds to user
queries by decomposing tasks into a sequence of sub-steps, which are then
executed by the action agent. To optimize performance on resource-constrained
devices, we employ model fine-tuning instead of in-context learning, reducing
computational costs and energy consumption while improving response times. Our
approach involves using GPT-4 to generate diverse planning queries and
responses based on available functions, with subsequent validations to ensure
data quality. We fine-tune the Phi-3 Mini model on this curated dataset,
achieving a 97\% success rate in our in-domain test environment. To address
multi-domain planning challenges, we developed a multi-LoRA training method
that merges weights from LoRAs trained on distinct function subsets. This
approach enables flexible handling of complex, multi-domain queries while
maintaining computational efficiency on resource-constrained devices. To
support further research, we have open-sourced our model weights at
https://huggingface.co/NexaAIDev/octopus-planning. For the demo, please
refer to https://www.nexa4ai.com/octo-planner.Summary
AI-Generated Summary