章魚 v2：用於超級代理的設備端語言模型

摘要

語言模型在各種軟體應用中展現出效能，尤其在與自動工作流相關的任務中。這些模型具有關鍵的能力來呼叫函數，這對於創建人工智慧代理是至關重要的。儘管大規模語言模型在雲端環境中表現出色，但常常伴隨著隱私和成本方面的擔憂。目前用於函數呼叫的設備內模型面臨延遲和準確性問題。我們的研究提出了一種新方法，使得具有 20 億參數的設備內模型在準確性和延遲方面均超越了 GPT-4，並將上下文長度減少了 95%。與基於 RAG 的函數呼叫機制的 Llama-7B 相比，我們的方法將延遲提高了 35 倍。這種方法將延遲降低到被認為適用於在生產環境中部署各種邊緣設備的水準，符合真實應用的性能要求。

English

Language models have shown effectiveness in a variety of software applications, particularly in tasks related to automatic workflow. These models possess the crucial ability to call functions, which is essential in creating AI agents. Despite the high performance of large-scale language models in cloud environments, they are often associated with concerns over privacy and cost. Current on-device models for function calling face issues with latency and accuracy. Our research presents a new method that empowers an on-device model with 2 billion parameters to surpass the performance of GPT-4 in both accuracy and latency, and decrease the context length by 95\%. When compared to Llama-7B with a RAG-based function calling mechanism, our method enhances latency by 35-fold. This method reduces the latency to levels deemed suitable for deployment across a variety of edge devices in production environments, aligning with the performance requisites for real-world applications.

章魚 v2：用於超級代理的設備端語言模型

Octopus v2: On-device language model for super agent

摘要

Support