ドルフィン：エネルギー効率の良いオンデバイス言語モデルのための新しいモダリティとしての長いコンテキスト

要旨

本論文では、言語モデルにおける長いコンテキストのエネルギー効率の良い処理のための革新的なデコーダー・デコーダーアーキテクチャであるDolphinを提案します。当アプローチは、デバイス内モデルに固有の著しいエネルギー消費と遅延の課題に対処しています。Dolphinは、コンパクトな0.5Bパラメーターデコーダーを使用して、広範な文脈情報をメモリ埋め込みに蒸留し、主要な7Bパラメーターデコーダーモデルの入力長を大幅に削減しています。ビジョン・ランゲージモデルに触発され、画像埋め込みプロジェクターを再利用して、長いテキストコンテキストをエンコードし、拡張されたコンテキストを異なるモダリティとして効果的に扱っています。この革新的な手法により、拡張された入力シーケンスに伴う通常の計算オーバーヘッドなしに、大幅に長いコンテキストの処理が可能となります。経験的評価により、従来の完全なコンテキスト処理方法と比較して、エネルギー効率が10倍向上し、遅延が5倍短縮されることが示されましたが、応答の品質は損なわれていません。当研究は、エネルギー効率の良い反応性の高いAI技術が資源に制約のある環境で必要とされる中、デバイス内アプリケーション向けの持続可能でスケーラブルな言語モデルの開発に貢献しています。この研究は、特にリソースに制約のある環境向けの効率的なモデル設計の領域で、自然言語処理の広範な分野に影響を与えます。エッジデバイス上でより洗練されたAI機能を可能にすることで、Dolphinは、計算リソースが貴重な様々なアプリケーションにおいて、高度な言語処理の道を開いています。Dolphinモデルは、https://huggingface.co/NexaAIDev/Dolphin で公開されています。

English

This paper presents Dolphin, a novel decoder-decoder architecture for energy-efficient processing of long contexts in language models. Our approach addresses the significant energy consumption and latency challenges inherent in on-device models. Dolphin employs a compact 0.5B parameter decoder to distill extensive contextual information into a memory embedding, substantially reducing the input length for the primary 7B parameter decoder model. Inspired by vision-language models, we repurpose the image embedding projector to encode long textual contexts, effectively treating extended context as a distinct modality. This innovative method enables processing of substantially longer contexts without the typical computational overhead associated with extended input sequences. Empirical evaluations demonstrate a 10-fold improvement in energy efficiency and a 5-fold reduction in latency compared to conventional full-length context processing methods without losing quality of the response. Our work contributes to the development of more sustainable and scalable language models for on-device applications, addressing the critical need for energy-efficient and responsive AI technologies in resource-constrained environments while maintaining the accuracy to understand long contexts. This research has implications for the broader field of natural language processing, particularly in the domain of efficient model design for resource-limited settings. By enabling more sophisticated AI capabilities on edge devices, Dolphin paves the way for advanced language processing in a wide range of applications where computational resources are at a premium. The Dolphin model is publicly available at https://huggingface.co/NexaAIDev/Dolphin.

ドルフィン：エネルギー効率の良いオンデバイス言語モデルのための新しいモダリティとしての長いコンテキスト

Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

要旨

Support