可配置的基礎模型：從模塊化的角度構建LLM

摘要

最近在LLM的進展中揭示了與計算效率和持續可擴展性相關的挑戰，這是由於它們對龐大參數的需求，導致這些模型在具有有限計算資源的設備上應用和演進變得越來越繁瑣。受到人類大腦內部模塊化的啟發，人們越來越傾向於將LLM分解為多個功能模塊，允許對部分模塊進行推理，並動態組裝模塊以應對複雜任務，例如專家混合模型。為了突顯模塊化方法的固有效率和可組合性，我們提出了一個術語“磚塊”來代表每個功能模塊，將模塊化結構定義為可配置的基礎模型。在本文中，我們對可配置基礎模型的構建、利用和限制進行了全面的概述和調查。我們首先將模塊正式劃分為新興磚塊 - 在預訓練階段出現的功能神經元分區，以及定制磚塊 - 通過額外的後訓練構建的磚塊，以提高LLM的能力和知識。基於多樣的功能磚塊，我們進一步提出了四種以磚塊為導向的操作：檢索和路由、合併、更新和擴展。這些操作允許根據指令對LLM進行動態配置，以應對複雜任務。為了驗證我們的觀點，我們對廣泛使用的LLM進行了實證分析。我們發現FFN層遵循模塊化模式，具有神經元的功能專門化和功能神經元分區。最後，我們強調了一些未解決的問題和未來研究方向。總的來說，本文旨在提供對現有LLM研究的新模塊化觀點，並激發未來創建更高效和可擴展基礎模型的創新。

English

Advancements in LLMs have recently unveiled challenges tied to computational efficiency and continual scalability due to their requirements of huge parameters, making the applications and evolution of these models on devices with limited computation resources and scenarios requiring various abilities increasingly cumbersome. Inspired by modularity within the human brain, there is a growing tendency to decompose LLMs into numerous functional modules, allowing for inference with part of modules and dynamic assembly of modules to tackle complex tasks, such as mixture-of-experts. To highlight the inherent efficiency and composability of the modular approach, we coin the term brick to represent each functional module, designating the modularized structure as configurable foundation models. In this paper, we offer a comprehensive overview and investigation of the construction, utilization, and limitation of configurable foundation models. We first formalize modules into emergent bricks - functional neuron partitions that emerge during the pre-training phase, and customized bricks - bricks constructed via additional post-training to improve the capabilities and knowledge of LLMs. Based on diverse functional bricks, we further present four brick-oriented operations: retrieval and routing, merging, updating, and growing. These operations allow for dynamic configuration of LLMs based on instructions to handle complex tasks. To verify our perspective, we conduct an empirical analysis on widely-used LLMs. We find that the FFN layers follow modular patterns with functional specialization of neurons and functional neuron partitions. Finally, we highlight several open issues and directions for future research. Overall, this paper aims to offer a fresh modular perspective on existing LLM research and inspire the future creation of more efficient and scalable foundational models.

可配置的基礎模型：從模塊化的角度構建LLM

Configurable Foundation Models: Building LLMs from a Modular Perspective

摘要

Support