章鱼v2:超级代理设备端语言模型
Octopus v2: On-device language model for super agent
April 2, 2024
作者: Wei Chen, Zhiyuan Li
cs.AI
摘要
语言模型在多种软件应用中展现了其有效性,尤其是在与自动工作流相关的任务中。这些模型具备调用函数的关键能力,这对于创建AI代理至关重要。尽管大规模语言模型在云环境中表现出色,但它们往往伴随着隐私和成本方面的担忧。当前的设备端模型在函数调用方面存在延迟和准确性问题。我们的研究提出了一种新方法,使一个拥有20亿参数的设备端模型在准确性和延迟方面均超越了GPT-4,并将上下文长度减少了95%。与采用RAG机制的Llama-7B相比,我们的方法将延迟提升了35倍。这种方法将延迟降低到适合在生产环境中部署于各种边缘设备的水平,符合实际应用的性能要求。
English
Language models have shown effectiveness in a variety of software
applications, particularly in tasks related to automatic workflow. These models
possess the crucial ability to call functions, which is essential in creating
AI agents. Despite the high performance of large-scale language models in cloud
environments, they are often associated with concerns over privacy and cost.
Current on-device models for function calling face issues with latency and
accuracy. Our research presents a new method that empowers an on-device model
with 2 billion parameters to surpass the performance of GPT-4 in both accuracy
and latency, and decrease the context length by 95\%. When compared to Llama-7B
with a RAG-based function calling mechanism, our method enhances latency by
35-fold. This method reduces the latency to levels deemed suitable for
deployment across a variety of edge devices in production environments,
aligning with the performance requisites for real-world applications.Summary
AI-Generated Summary