Octopus v2: スーパーエージェントのためのオンデバイス言語モデル

要旨

言語モデルは、特に自動ワークフロー関連のタスクにおいて、様々なソフトウェアアプリケーションで有効性を示しています。これらのモデルは、AIエージェントの作成に不可欠な関数呼び出し能力を備えています。大規模言語モデルはクラウド環境で高い性能を発揮しますが、プライバシーやコストに関する懸念がしばしば伴います。現在のオンデバイスモデルは、関数呼び出しにおいてレイテンシと精度の問題に直面しています。本研究では、20億パラメータのオンデバイスモデルが、精度とレイテンシの両面でGPT-4を上回り、コンテキスト長を95％削減する新手法を提案します。RAGベースの関数呼び出しメカニズムを備えたLlama-7Bと比較すると、本手法はレイテンシを35倍改善します。この手法により、実環境でのアプリケーションに適した性能要件を満たしつつ、様々なエッジデバイスへの展開に適したレイテンシレベルを実現します。

English

Language models have shown effectiveness in a variety of software applications, particularly in tasks related to automatic workflow. These models possess the crucial ability to call functions, which is essential in creating AI agents. Despite the high performance of large-scale language models in cloud environments, they are often associated with concerns over privacy and cost. Current on-device models for function calling face issues with latency and accuracy. Our research presents a new method that empowers an on-device model with 2 billion parameters to surpass the performance of GPT-4 in both accuracy and latency, and decrease the context length by 95\%. When compared to Llama-7B with a RAG-based function calling mechanism, our method enhances latency by 35-fold. This method reduces the latency to levels deemed suitable for deployment across a variety of edge devices in production environments, aligning with the performance requisites for real-world applications.

Octopus v2: スーパーエージェントのためのオンデバイス言語モデル

Octopus v2: On-device language model for super agent

要旨

Summary

Support

Support