八达规划器:用于规划-动作代理的设备端语言模型
Octo-planner: On-device Language Model for Planner-Action Agents
June 26, 2024
作者: Wei Chen, Zhiyuan Li, Zhen Guo, Yikang Shen
cs.AI
摘要
在各个领域中,AI代理变得越来越重要,实现了自主决策和问题解决。为了有效运行,这些代理需要一个规划过程,确定最佳行动方案,然后执行计划中的行动。本文介绍了一种高效的设备内规划-执行框架,将规划和行动执行分为两个独立组件:基于Phi-3 Mini的规划代理,这是一个针对边缘设备优化的38亿参数LLM,以及使用章鱼模型进行功能执行的行动代理。规划代理首先通过将任务分解为一系列子步骤来响应用户查询,然后由行动代理执行这些步骤。为了在资源受限设备上优化性能,我们采用模型微调而非上下文学习,降低计算成本和能耗,同时提高响应时间。我们的方法涉及使用GPT-4根据可用功能生成多样化的规划查询和响应,随后进行验证以确保数据质量。我们在这个筛选后的数据集上微调了Phi-3 Mini模型,在我们的领域内测试环境中实现了97%的成功率。为了解决多领域规划挑战,我们开发了一种多LoRA训练方法,将在不同功能子集上训练的LoRA的权重合并。这种方法在资源受限设备上实现了计算效率的同时,灵活处理复杂的多领域查询。为了支持进一步研究,我们已在https://huggingface.co/NexaAIDev/octopus-planning开源了我们的模型权重。有关演示,请参阅https://www.nexa4ai.com/octo-planner。
English
AI agents have become increasingly significant in various domains, enabling
autonomous decision-making and problem-solving. To function effectively, these
agents require a planning process that determines the best course of action and
then executes the planned actions. In this paper, we present an efficient
on-device Planner-Action framework that separates planning and action execution
into two distinct components: a planner agent based on Phi-3 Mini, a 3.8
billion parameter LLM optimized for edge devices, and an action agent using the
Octopus model for function execution. The planner agent first responds to user
queries by decomposing tasks into a sequence of sub-steps, which are then
executed by the action agent. To optimize performance on resource-constrained
devices, we employ model fine-tuning instead of in-context learning, reducing
computational costs and energy consumption while improving response times. Our
approach involves using GPT-4 to generate diverse planning queries and
responses based on available functions, with subsequent validations to ensure
data quality. We fine-tune the Phi-3 Mini model on this curated dataset,
achieving a 97\% success rate in our in-domain test environment. To address
multi-domain planning challenges, we developed a multi-LoRA training method
that merges weights from LoRAs trained on distinct function subsets. This
approach enables flexible handling of complex, multi-domain queries while
maintaining computational efficiency on resource-constrained devices. To
support further research, we have open-sourced our model weights at
https://huggingface.co/NexaAIDev/octopus-planning. For the demo, please
refer to https://www.nexa4ai.com/octo-planner.Summary
AI-Generated Summary