FuseChat: チャットモデルの知識融合

要旨

大規模言語モデル（LLM）をゼロからトレーニングすることは、確かに独自の能力と強みを持つモデルを生み出すことができるが、このアプローチは多大なコストを伴い、能力の重複を引き起こす可能性がある。代替戦略として、既存のLLMを組み合わせてより強力なLLMを構築し、高価な事前トレーニングの必要性を軽減する方法がある。しかし、LLMの多様なアーキテクチャのため、直接的なパラメータのブレンドは実現不可能である。最近、FuseLLMは、構造的に異なる複数のLLMの集合的な知識を軽量な継続的トレーニングを通じてターゲットLLMに転送する「知識融合」の概念を導入した。本報告では、FuseLLMフレームワークの拡張性と柔軟性を高め、チャットLLMの融合を実現するFuseChatを提案する。 FuseChatは主に2つの段階から構成される。まず、構造的および規模的に異なるソースLLMに対して知識融合を行い、軽微なファインチューニングを通じて同一の構造とサイズを持つ複数のターゲットLLMを導出する。次に、これらのターゲットLLMをパラメータ空間内で統合し、ファインチューニング前後のパラメータ行列の変動率に基づいて統合ウェイトを決定する新たな手法を提案する。我々は、NH2-Mixtral-8x7B、NH2-Solar-10.7B、OpenChat-3.5-7Bという多様なアーキテクチャと規模を持つ3つの主要なチャットLLMを用いてこのアプローチを検証した。様々なチャットドメインにわたる実験結果は、7Bおよび34Bスケールの広範なチャットLLMにおいて\textsc{FuseChat-7B}の優位性を示しており、GPT-3.5（3月版）を上回り、Mixtral-8x7B-Instructに迫る性能を発揮している。我々のコード、モデル重み、データはhttps://github.com/fanqiwan/FuseLLMで公開されている。

English

While training large language models (LLMs) from scratch can indeed lead to models with distinct capabilities and strengths, this approach incurs substantial costs and may lead to potential redundancy in competencies. An alternative strategy is to combine existing LLMs into a more robust LLM, thereby diminishing the necessity for expensive pre-training. However, due to the diverse architectures of LLMs, direct parameter blending proves to be unfeasible. Recently, FuseLLM introduced the concept of knowledge fusion to transfer the collective knowledge of multiple structurally varied LLMs into a target LLM through lightweight continual training. In this report, we extend the scalability and flexibility of the FuseLLM framework to realize the fusion of chat LLMs, resulting in FuseChat. FuseChat comprises two main stages. Firstly, we undertake knowledge fusion for structurally and scale-varied source LLMs to derive multiple target LLMs of identical structure and size via lightweight fine-tuning. Then, these target LLMs are merged within the parameter space, wherein we propose a novel method for determining the merging weights based on the variation ratio of parameter matrices before and after fine-tuning. We validate our approach using three prominent chat LLMs with diverse architectures and scales, namely NH2-Mixtral-8x7B, NH2-Solar-10.7B, and OpenChat-3.5-7B. Experimental results spanning various chat domains demonstrate the superiority of \textsc{FuseChat-7B} across a broad spectrum of chat LLMs at 7B and 34B scales, even surpassing GPT-3.5 (March) and approaching Mixtral-8x7B-Instruct. Our code, model weights, and data are openly accessible at https://github.com/fanqiwan/FuseLLM.

FuseChat: チャットモデルの知識融合

FuseChat: Knowledge Fusion of Chat Models

要旨

Support