MobileLLM: オンデバイスユースケース向けに最適化されたサブ10億パラメータ言語モデル

要旨

本論文は、クラウドコストの増加とレイテンシに関する懸念から、モバイルデバイス向けの効率的な大規模言語モデル（LLM）の必要性が高まっている現状に対処する。我々は、モバイル展開に適した実用的な選択肢として、10億パラメータ未満の高品質LLMの設計に焦点を当てる。モデルの品質を決定する上でデータとパラメータ量の重要性が強調される一般的な見解とは対照的に、我々の調査は、10億規模未満のLLMにおいてモデルアーキテクチャの重要性を浮き彫りにしている。深くて細いアーキテクチャを活用し、埋め込み共有とグループ化クエリ注意メカニズムを組み合わせることで、MobileLLMと称する強力なベースラインネットワークを確立し、先行する1億2500万/3億5000万パラメータの最先端モデルに対して2.7%/4.3%の精度向上を達成した。さらに、モデルサイズの増加なしに、わずかなレイテンシオーバーヘッドのみで即時ブロック単位の重み共有アプローチを提案する。その結果得られたMobileLLM-LSモデルは、MobileLLM 1億2500万/3億5000万モデルに対してさらに0.7%/0.8%の精度向上を示した。さらに、MobileLLMモデルファミリーは、従来の10億規模未満のモデルと比較してチャットベンチマークで大幅な改善を示し、API呼び出しタスクにおいてLLaMA-v2 7Bに近い正確性を実証し、一般的なオンデバイスユースケースにおける小型モデルの能力を強調している。

English

This paper addresses the growing need for efficient large language models (LLMs) on mobile devices, driven by increasing cloud costs and latency concerns. We focus on designing top-quality LLMs with fewer than a billion parameters, a practical choice for mobile deployment. Contrary to prevailing belief emphasizing the pivotal role of data and parameter quantity in determining model quality, our investigation underscores the significance of model architecture for sub-billion scale LLMs. Leveraging deep and thin architectures, coupled with embedding sharing and grouped-query attention mechanisms, we establish a strong baseline network denoted as MobileLLM, which attains a remarkable 2.7%/4.3% accuracy boost over preceding 125M/350M state-of-the-art models. Additionally, we propose an immediate block-wise weight sharing approach with no increase in model size and only marginal latency overhead. The resultant models, denoted as MobileLLM-LS, demonstrate a further accuracy enhancement of 0.7%/0.8% than MobileLLM 125M/350M. Moreover, MobileLLM model family shows significant improvements compared to previous sub-billion models on chat benchmarks, and demonstrates close correctness to LLaMA-v2 7B in API calling tasks, highlighting the capability of small models for common on-device use cases.

MobileLLM: オンデバイスユースケース向けに最適化されたサブ10億パラメータ言語モデル

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

要旨

Support