速度制勝:大型語言模型高效架構綜述
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models
August 13, 2025
作者: Weigao Sun, Jiaxi Hu, Yucheng Zhou, Jusen Du, Disen Lan, Kexin Wang, Tong Zhu, Xiaoye Qu, Yu Zhang, Xiaoyu Mo, Daizong Liu, Yuxuan Liang, Wenliang Chen, Guoqi Li, Yu Cheng
cs.AI
摘要
大型語言模型(LLMs)在語言理解、生成、推理等方面取得了令人矚目的成果,並推動了多模態模型的能力邊界。作為現代LLMs基礎的Transformer模型,提供了具有優異擴展性的強力基線。然而,傳統的Transformer架構需要大量計算,並在大規模訓練和實際部署中構成了顯著障礙。在本調查中,我們系統性地審視了創新LLM架構,這些架構旨在解決Transformer的固有局限並提升效率。從語言建模出發,本調查涵蓋了線性和稀疏序列建模方法、高效全注意力變體、稀疏專家混合、融合上述技術的混合模型架構,以及新興的擴散LLMs的背景與技術細節。此外,我們還探討了這些技術在其他模態上的應用,並考慮了它們在開發可擴展、資源感知的基礎模型方面的廣泛影響。通過將近期研究歸類於上述範疇,本調查呈現了現代高效LLM架構的藍圖,我們希望這能激勵未來研究朝著更高效、更通用的AI系統邁進。
English
Large Language Models (LLMs) have delivered impressive results in language
understanding, generation, reasoning, and pushes the ability boundary of
multimodal models. Transformer models, as the foundation of modern LLMs, offer
a strong baseline with excellent scaling properties. However, the traditional
transformer architecture requires substantial computations and poses
significant obstacles for large-scale training and practical deployment. In
this survey, we offer a systematic examination of innovative LLM architectures
that address the inherent limitations of transformers and boost the efficiency.
Starting from language modeling, this survey covers the background and
technical details of linear and sparse sequence modeling methods, efficient
full attention variants, sparse mixture-of-experts, hybrid model architectures
incorporating the above techniques, and emerging diffusion LLMs. Additionally,
we discuss applications of these techniques to other modalities and consider
their wider implications for developing scalable, resource-aware foundation
models. By grouping recent studies into the above category, this survey
presents a blueprint of modern efficient LLM architectures, and we hope this
could help motivate future research toward more efficient, versatile AI
systems.