可配置的基础模型:从模块化角度构建LLM
Configurable Foundation Models: Building LLMs from a Modular Perspective
September 4, 2024
作者: Chaojun Xiao, Zhengyan Zhang, Chenyang Song, Dazhi Jiang, Feng Yao, Xu Han, Xiaozhi Wang, Shuo Wang, Yufei Huang, Guanyu Lin, Yingfa Chen, Weilin Zhao, Yuge Tu, Zexuan Zhong, Ao Zhang, Chenglei Si, Khai Hao Moo, Chenyang Zhao, Huimin Chen, Yankai Lin, Zhiyuan Liu, Jingbo Shang, Maosong Sun
cs.AI
摘要
最近LLM的进展揭示了与计算效率和持续可扩展性相关的挑战,这是由于它们对大量参数的需求,使得在计算资源有限的设备上应用和演进这些模型以及在需要各种能力的场景中变得越来越繁琐。受人脑中的模块化启发,人们越来越倾向于将LLM分解为许多功能模块,允许对部分模块进行推理,并动态组装模块以解决复杂任务,例如专家组合。为了突出模块化方法的固有效率和可组合性,我们创造了术语"brick"来代表每个功能模块,将模块化结构称为可配置的基础模型。在本文中,我们全面概述和调查了可配置基础模型的构建、利用和局限性。我们首先将模块形式化为新兴的brick - 在预训练阶段出现的功能神经元分区,以及定制的brick - 通过额外的后续训练构建的brick,以提高LLM的能力和知识。基于各种功能brick,我们进一步提出了四种基于brick的操作:检索和路由、合并、更新和增长。这些操作允许根据指令动态配置LLM以处理复杂任务。为了验证我们的观点,我们对广泛使用的LLM进行了实证分析。我们发现FFN层遵循模块化模式,具有神经元的功能专业化和功能神经元分区。最后,我们强调了一些未解决的问题和未来研究的方向。总的来说,本文旨在为现有LLM研究提供一种新颖的模块化视角,并激发未来创造更高效和可扩展的基础模型。
English
Advancements in LLMs have recently unveiled challenges tied to computational
efficiency and continual scalability due to their requirements of huge
parameters, making the applications and evolution of these models on devices
with limited computation resources and scenarios requiring various abilities
increasingly cumbersome. Inspired by modularity within the human brain, there
is a growing tendency to decompose LLMs into numerous functional modules,
allowing for inference with part of modules and dynamic assembly of modules to
tackle complex tasks, such as mixture-of-experts. To highlight the inherent
efficiency and composability of the modular approach, we coin the term brick to
represent each functional module, designating the modularized structure as
configurable foundation models. In this paper, we offer a comprehensive
overview and investigation of the construction, utilization, and limitation of
configurable foundation models. We first formalize modules into emergent bricks
- functional neuron partitions that emerge during the pre-training phase, and
customized bricks - bricks constructed via additional post-training to improve
the capabilities and knowledge of LLMs. Based on diverse functional bricks, we
further present four brick-oriented operations: retrieval and routing, merging,
updating, and growing. These operations allow for dynamic configuration of LLMs
based on instructions to handle complex tasks. To verify our perspective, we
conduct an empirical analysis on widely-used LLMs. We find that the FFN layers
follow modular patterns with functional specialization of neurons and
functional neuron partitions. Finally, we highlight several open issues and
directions for future research. Overall, this paper aims to offer a fresh
modular perspective on existing LLM research and inspire the future creation of
more efficient and scalable foundational models.Summary
AI-Generated Summary