FuseChat: 채팅 모델의 지식 융합

초록

대규모 언어 모델(LLM)을 처음부터 학습시키는 것은 독특한 능력과 강점을 가진 모델을 얻을 수 있지만, 이 접근 방식은 상당한 비용이 들고 역량의 중복을 초래할 가능성이 있습니다. 대안 전략으로는 기존의 LLM들을 결합하여 더 강력한 LLM을 만드는 것이 있으며, 이를 통해 비용이 많이 드는 사전 학습의 필요성을 줄일 수 있습니다. 그러나 LLM들의 다양한 아키텍처로 인해 직접적인 파라미터 결합은 실현 가능하지 않습니다. 최근 FuseLLM은 구조적으로 다양한 여러 LLM의 집단 지식을 경량의 지속적 학습을 통해 목표 LLM으로 전달하는 지식 융합 개념을 도입했습니다. 이 보고서에서는 FuseLLM 프레임워크의 확장성과 유연성을 확장하여 채팅 LLM의 융합을 실현한 FuseChat을 소개합니다. FuseChat은 두 가지 주요 단계로 구성됩니다. 첫째, 구조적 및 규모적으로 다양한 소스 LLM에 대해 지식 융합을 수행하여 동일한 구조와 크기를 가진 여러 목표 LLM을 경량의 미세 조정을 통해 도출합니다. 그런 다음, 이러한 목표 LLM들은 파라미터 공간 내에서 병합되며, 여기서 우리는 미세 조정 전후의 파라미터 행렬 변동 비율을 기반으로 병합 가중치를 결정하는 새로운 방법을 제안합니다. 우리는 다양한 아키텍처와 규모를 가진 세 가지 주요 채팅 LLM, 즉 NH2-Mixtral-8x7B, NH2-Solar-10.7B, 그리고 OpenChat-3.5-7B를 사용하여 우리의 접근 방식을 검증합니다. 다양한 채팅 도메인에 걸친 실험 결과는 \textsc{FuseChat-7B}가 7B 및 34B 규모의 광범위한 채팅 LLM들 중에서 우수성을 보여주며, GPT-3.5 (3월)를 능가하고 Mixtral-8x7B-Instruct에 근접함을 입증합니다. 우리의 코드, 모델 가중치 및 데이터는 https://github.com/fanqiwan/FuseLLM에서 공개적으로 접근 가능합니다.

English

While training large language models (LLMs) from scratch can indeed lead to models with distinct capabilities and strengths, this approach incurs substantial costs and may lead to potential redundancy in competencies. An alternative strategy is to combine existing LLMs into a more robust LLM, thereby diminishing the necessity for expensive pre-training. However, due to the diverse architectures of LLMs, direct parameter blending proves to be unfeasible. Recently, FuseLLM introduced the concept of knowledge fusion to transfer the collective knowledge of multiple structurally varied LLMs into a target LLM through lightweight continual training. In this report, we extend the scalability and flexibility of the FuseLLM framework to realize the fusion of chat LLMs, resulting in FuseChat. FuseChat comprises two main stages. Firstly, we undertake knowledge fusion for structurally and scale-varied source LLMs to derive multiple target LLMs of identical structure and size via lightweight fine-tuning. Then, these target LLMs are merged within the parameter space, wherein we propose a novel method for determining the merging weights based on the variation ratio of parameter matrices before and after fine-tuning. We validate our approach using three prominent chat LLMs with diverse architectures and scales, namely NH2-Mixtral-8x7B, NH2-Solar-10.7B, and OpenChat-3.5-7B. Experimental results spanning various chat domains demonstrate the superiority of \textsc{FuseChat-7B} across a broad spectrum of chat LLMs at 7B and 34B scales, even surpassing GPT-3.5 (March) and approaching Mixtral-8x7B-Instruct. Our code, model weights, and data are openly accessible at https://github.com/fanqiwan/FuseLLM.

FuseChat: 채팅 모델의 지식 융합

FuseChat: Knowledge Fusion of Chat Models

초록

Support