FuseChat: Объединение знаний чат-моделей

Аннотация

Хотя обучение больших языковых моделей (LLM) с нуля действительно может привести к моделям с уникальными возможностями и сильными сторонами, этот подход сопряжен с существенными затратами и может привести к избыточности в компетенциях. Альтернативная стратегия заключается в объединении существующих LLM в более мощную модель, тем самым снижая необходимость в дорогостоящем предварительном обучении. Однако из-за разнообразия архитектур LLM прямое смешивание параметров оказывается невозможным. Недавно FuseLLM представил концепцию слияния знаний для передачи коллективных знаний нескольких структурно различных LLM в целевую модель с помощью легковесного непрерывного обучения. В данном отчете мы расширяем масштабируемость и гибкость фреймворка FuseLLM для реализации слияния чат-LLM, что приводит к созданию FuseChat. FuseChat состоит из двух основных этапов. Во-первых, мы осуществляем слияние знаний для исходных LLM с различной структурой и масштабом, чтобы получить несколько целевых LLM одинаковой структуры и размера с помощью легковесного тонкого настройки. Затем эти целевые LLM объединяются в пространстве параметров, где мы предлагаем новый метод определения весов объединения на основе коэффициента вариации матриц параметров до и после тонкой настройки. Мы проверяем наш подход на трех известных чат-LLM с различными архитектурами и масштабами, а именно NH2-Mixtral-8x7B, NH2-Solar-10.7B и OpenChat-3.5-7B. Экспериментальные результаты, охватывающие различные области чатов, демонстрируют превосходство \textsc{FuseChat-7B} в широком спектре чат-LLM на масштабах 7B и 34B, даже превосходя GPT-3.5 (март) и приближаясь к Mixtral-8x7B-Instruct. Наш код, веса модели и данные открыто доступны по адресу https://github.com/fanqiwan/FuseLLM.

English

While training large language models (LLMs) from scratch can indeed lead to models with distinct capabilities and strengths, this approach incurs substantial costs and may lead to potential redundancy in competencies. An alternative strategy is to combine existing LLMs into a more robust LLM, thereby diminishing the necessity for expensive pre-training. However, due to the diverse architectures of LLMs, direct parameter blending proves to be unfeasible. Recently, FuseLLM introduced the concept of knowledge fusion to transfer the collective knowledge of multiple structurally varied LLMs into a target LLM through lightweight continual training. In this report, we extend the scalability and flexibility of the FuseLLM framework to realize the fusion of chat LLMs, resulting in FuseChat. FuseChat comprises two main stages. Firstly, we undertake knowledge fusion for structurally and scale-varied source LLMs to derive multiple target LLMs of identical structure and size via lightweight fine-tuning. Then, these target LLMs are merged within the parameter space, wherein we propose a novel method for determining the merging weights based on the variation ratio of parameter matrices before and after fine-tuning. We validate our approach using three prominent chat LLMs with diverse architectures and scales, namely NH2-Mixtral-8x7B, NH2-Solar-10.7B, and OpenChat-3.5-7B. Experimental results spanning various chat domains demonstrate the superiority of \textsc{FuseChat-7B} across a broad spectrum of chat LLMs at 7B and 34B scales, even surpassing GPT-3.5 (March) and approaching Mixtral-8x7B-Instruct. Our code, model weights, and data are openly accessible at https://github.com/fanqiwan/FuseLLM.

FuseChat: Объединение знаний чат-моделей

FuseChat: Knowledge Fusion of Chat Models

Аннотация

Support