速度制胜：大语言模型高效架构研究综述

摘要

大型语言模型（LLMs）在语言理解、生成、推理等方面取得了显著成果，并推动了多模态模型的能力边界。作为现代LLMs基础的Transformer模型，提供了具有优异扩展特性的强大基线。然而，传统Transformer架构需要大量计算资源，给大规模训练和实际部署带来了显著障碍。本综述系统性地探讨了针对Transformer固有局限的创新LLM架构，旨在提升效率。从语言建模出发，本文涵盖了线性与稀疏序列建模方法、高效全注意力变体、稀疏专家混合、融合上述技术的混合模型架构，以及新兴的扩散型LLMs的背景与技术细节。此外，我们还讨论了这些技术在其他模态上的应用，并思考了它们对开发可扩展、资源感知的基础模型的广泛意义。通过将近期研究归类于上述范畴，本综述呈现了现代高效LLM架构的蓝图，我们期望这能激励未来研究朝着更高效、多功能的AI系统迈进。

English

Large Language Models (LLMs) have delivered impressive results in language understanding, generation, reasoning, and pushes the ability boundary of multimodal models. Transformer models, as the foundation of modern LLMs, offer a strong baseline with excellent scaling properties. However, the traditional transformer architecture requires substantial computations and poses significant obstacles for large-scale training and practical deployment. In this survey, we offer a systematic examination of innovative LLM architectures that address the inherent limitations of transformers and boost the efficiency. Starting from language modeling, this survey covers the background and technical details of linear and sparse sequence modeling methods, efficient full attention variants, sparse mixture-of-experts, hybrid model architectures incorporating the above techniques, and emerging diffusion LLMs. Additionally, we discuss applications of these techniques to other modalities and consider their wider implications for developing scalable, resource-aware foundation models. By grouping recent studies into the above category, this survey presents a blueprint of modern efficient LLM architectures, and we hope this could help motivate future research toward more efficient, versatile AI systems.

速度制胜：大语言模型高效架构研究综述

Speed Always Wins: A Survey on Efficient Architectures for Large Language Models

摘要

Support