スピードは常に勝つ：大規模言語モデルのための効率的なアーキテクチャに関する調査

要旨

大規模言語モデル（LLMs）は、言語理解、生成、推論において印象的な結果を達成し、マルチモーダルモデルの能力の限界を押し広げています。現代のLLMsの基盤となるTransformerモデルは、優れたスケーリング特性を持つ強力なベースラインを提供します。しかし、従来のTransformerアーキテクチャは大量の計算を必要とし、大規模なトレーニングや実用的な展開において重大な障害をもたらします。本調査では、Transformerの内在的な制限を克服し、効率を向上させる革新的なLLMアーキテクチャを体系的に検討します。言語モデリングを出発点として、本調査は線形および疎なシーケンスモデリング手法、効率的な完全注意機構のバリエーション、疎な混合エキスパート、これらの技術を組み込んだハイブリッドモデルアーキテクチャ、そして新興の拡散LLMsの背景と技術的詳細を網羅します。さらに、これらの技術を他のモダリティに適用する事例や、スケーラブルでリソースを意識した基盤モデルの開発に対する広範な影響についても議論します。最近の研究を上記のカテゴリに分類することで、本調査は現代の効率的なLLMアーキテクチャの青図を提示し、より効率的で汎用性の高いAIシステムに向けた将来の研究を促進することを期待しています。

English

Large Language Models (LLMs) have delivered impressive results in language understanding, generation, reasoning, and pushes the ability boundary of multimodal models. Transformer models, as the foundation of modern LLMs, offer a strong baseline with excellent scaling properties. However, the traditional transformer architecture requires substantial computations and poses significant obstacles for large-scale training and practical deployment. In this survey, we offer a systematic examination of innovative LLM architectures that address the inherent limitations of transformers and boost the efficiency. Starting from language modeling, this survey covers the background and technical details of linear and sparse sequence modeling methods, efficient full attention variants, sparse mixture-of-experts, hybrid model architectures incorporating the above techniques, and emerging diffusion LLMs. Additionally, we discuss applications of these techniques to other modalities and consider their wider implications for developing scalable, resource-aware foundation models. By grouping recent studies into the above category, this survey presents a blueprint of modern efficient LLM architectures, and we hope this could help motivate future research toward more efficient, versatile AI systems.

スピードは常に勝つ：大規模言語モデルのための効率的なアーキテクチャに関する調査

Speed Always Wins: A Survey on Efficient Architectures for Large Language Models

要旨

Support