구성 가능한 기반 모델: 모듈식 관점에서 LLM 구축

초록

LLM의 발전은 최근에 컴퓨팅 효율성과 지속적 확장성과 관련된 도전을 드러내었는데, 이는 거대한 매개변수를 필요로 하기 때문에 제한된 계산 자원을 갖는 장치 및 다양한 능력이 필요한 시나리오에서 이러한 모델의 응용 및 진화가 점점 더 번거로워지고 있다. 인간 뇌 내의 모듈성에서 영감을 받아 LLM을 다양한 기능 모듈로 분해하는 경향이 높아지고 있으며, 이는 모듈의 일부로 추론하고 모듈을 동적으로 조합하여 복합 작업(예: 전문가 모델의 혼합)을 처리할 수 있게 한다. 모듈 접근법의 내재적 효율성과 조립성을 강조하기 위해 우리는 각 기능 모듈을 나타내는 용어로 "brick"이라는 용어를 만들어내고, 이를 구성 가능한 기초 모델로 지정된 모듈화된 구조를 디자인한다. 본 논문에서는 구성 가능한 기초 모델의 구축, 활용 및 한계에 대한 포괄적 개요와 조사를 제공한다. 우리는 먼저 모듈을 신흥 브릭으로 공식화하고, 이는 사전 훈련 단계에서 발생하는 기능 뉴런 파티션으로, 그리고 맞춤형 브릭으로, LLM의 능력과 지식을 향상시키기 위해 추가적인 사후 훈련을 통해 구성된 브릭이다. 다양한 기능 브릭을 기반으로 우리는 검색 및 라우팅, 병합, 업데이트 및 확장이라는 네 가지 브릭 지향 작업을 제시한다. 이러한 작업은 복잡한 작업을 처리하기 위한 지침에 따라 LLM의 동적 구성을 가능하게 한다. 우리의 관점을 검증하기 위해 널리 사용되는 LLM에 대한 경험적 분석을 수행한다. 우리는 FFN 레이어가 뉴런의 기능적 특화와 기능 뉴런 파티션을 따르는 모듈 패턴을 보여준다. 마지막으로, 미래 연구를 위한 여러 개방적 문제와 방향을 강조한다. 이 논문은 기존 LLM 연구에 대한 신선한 모듈 관점을 제시하고 더 효율적이고 확장 가능한 기초 모델의 미래 창조를 영감 주기 위해 목표를 두고 있다.

English

Advancements in LLMs have recently unveiled challenges tied to computational efficiency and continual scalability due to their requirements of huge parameters, making the applications and evolution of these models on devices with limited computation resources and scenarios requiring various abilities increasingly cumbersome. Inspired by modularity within the human brain, there is a growing tendency to decompose LLMs into numerous functional modules, allowing for inference with part of modules and dynamic assembly of modules to tackle complex tasks, such as mixture-of-experts. To highlight the inherent efficiency and composability of the modular approach, we coin the term brick to represent each functional module, designating the modularized structure as configurable foundation models. In this paper, we offer a comprehensive overview and investigation of the construction, utilization, and limitation of configurable foundation models. We first formalize modules into emergent bricks - functional neuron partitions that emerge during the pre-training phase, and customized bricks - bricks constructed via additional post-training to improve the capabilities and knowledge of LLMs. Based on diverse functional bricks, we further present four brick-oriented operations: retrieval and routing, merging, updating, and growing. These operations allow for dynamic configuration of LLMs based on instructions to handle complex tasks. To verify our perspective, we conduct an empirical analysis on widely-used LLMs. We find that the FFN layers follow modular patterns with functional specialization of neurons and functional neuron partitions. Finally, we highlight several open issues and directions for future research. Overall, this paper aims to offer a fresh modular perspective on existing LLM research and inspire the future creation of more efficient and scalable foundational models.

구성 가능한 기반 모델: 모듈식 관점에서 LLM 구축

Configurable Foundation Models: Building LLMs from a Modular Perspective

초록

Support