設定可能なファウンデーションモデル：モジュラーな視点からLLMの構築

要旨

LLMの進歩により、巨大なパラメータが必要とされるため、計算効率と持続的なスケーラビリティに関連する課題が明らかになりました。これにより、計算リソースが限られているデバイスやさまざまな能力が必要なシナリオにおけるこれらモデルの適用と進化がますます煩雑になりました。人間の脳内のモジュラリティからインスピレーションを受け、LLMを多数の機能モジュールに分解する傾向が高まっており、部分モジュールでの推論やモジュールの動的組み立てによって複雑なタスク（例：専門家の混合）に対処しています。モジュラー手法の固有の効率性と組み合わせ可能性を強調するために、私たちは各機能モジュールを表す用語として「brick（ブリック）」という用語を造語し、モジュール化された構造を「configurable foundation models（設定可能な基盤モデル）」と指定しています。本論文では、設定可能な基盤モデルの構築、利用、および制限について包括的な概要と調査を提供します。まず、モジュールを新興ブリック（事前トレーニングフェーズ中に現れる機能ニューロンのパーティション）とカスタマイズされたブリック（LLMの能力と知識を向上させるために追加のポストトレーニングで構築されるブリック）に形式化します。さまざまな機能ブリックに基づいて、4つのブリック指向操作をさらに提示します：検索とルーティング、マージ、更新、成長。これらの操作により、複雑なタスクを処理するための指示に基づいてLLMを動的に構成することが可能となります。私たちの視点を検証するために、広く使用されているLLMについて実証分析を行います。FFN層は、ニューロンの機能の専門化と機能ニューロンのパーティションを持つモジュラーパターンに従っていることがわかります。最後に、将来の研究のためのいくつかの未解決の問題や方向性に焦点を当てます。全体として、本論文は既存のLLM研究に新しいモジュラーな視点を提供し、より効率的でスケーラブルな基盤モデルの創造をインスピレーションとすることを目的としています。

English

Advancements in LLMs have recently unveiled challenges tied to computational efficiency and continual scalability due to their requirements of huge parameters, making the applications and evolution of these models on devices with limited computation resources and scenarios requiring various abilities increasingly cumbersome. Inspired by modularity within the human brain, there is a growing tendency to decompose LLMs into numerous functional modules, allowing for inference with part of modules and dynamic assembly of modules to tackle complex tasks, such as mixture-of-experts. To highlight the inherent efficiency and composability of the modular approach, we coin the term brick to represent each functional module, designating the modularized structure as configurable foundation models. In this paper, we offer a comprehensive overview and investigation of the construction, utilization, and limitation of configurable foundation models. We first formalize modules into emergent bricks - functional neuron partitions that emerge during the pre-training phase, and customized bricks - bricks constructed via additional post-training to improve the capabilities and knowledge of LLMs. Based on diverse functional bricks, we further present four brick-oriented operations: retrieval and routing, merging, updating, and growing. These operations allow for dynamic configuration of LLMs based on instructions to handle complex tasks. To verify our perspective, we conduct an empirical analysis on widely-used LLMs. We find that the FFN layers follow modular patterns with functional specialization of neurons and functional neuron partitions. Finally, we highlight several open issues and directions for future research. Overall, this paper aims to offer a fresh modular perspective on existing LLM research and inspire the future creation of more efficient and scalable foundational models.

設定可能なファウンデーションモデル：モジュラーな視点からLLMの構築

Configurable Foundation Models: Building LLMs from a Modular Perspective

要旨

Support