ChatPaper.aiChatPaper

模型合併與安全對齊:一個糟糕的模型會損害整體效果

Model Merging and Safety Alignment: One Bad Model Spoils the Bunch

June 20, 2024
作者: Hasan Abed Al Kader Hammoud, Umberto Michieli, Fabio Pizzati, Philip Torr, Adel Bibi, Bernard Ghanem, Mete Ozay
cs.AI

摘要

將大型語言模型(LLMs)合併是一種成本效益高的技術,可將多個專家LLMs結合成一個通用模型,保留原始模型的專業知識。然而,目前的方法常常忽略了在合併過程中安全對齊的重要性,導致高度不對齊的模型。本研究探討模型合併對對齊的影響。我們評估了幾種常見的模型合併技術,顯示現有方法不僅轉移領域專業知識,還會傳播不對齊。我們提出了一種簡單的兩步方法來解決這個問題:(i)生成合成的安全和領域特定數據,以及(ii)將這些生成的數據納入現有數據感知模型合併技術的優化過程中。這使我們能夠將對齊視為一種可以在最終合併的LLM中最大化的技能。我們的實驗說明了在合併過程中整合與對齊相關的數據的有效性,從而產生在領域專業知識和對齊方面表現出色的模型。
English
Merging Large Language Models (LLMs) is a cost-effective technique for combining multiple expert LLMs into a single versatile model, retaining the expertise of the original ones. However, current approaches often overlook the importance of safety alignment during merging, leading to highly misaligned models. This work investigates the effects of model merging on alignment. We evaluate several popular model merging techniques, demonstrating that existing methods do not only transfer domain expertise but also propagate misalignment. We propose a simple two-step approach to address this problem: (i) generating synthetic safety and domain-specific data, and (ii) incorporating these generated data into the optimization process of existing data-aware model merging techniques. This allows us to treat alignment as a skill that can be maximized in the resulting merged LLM. Our experiments illustrate the effectiveness of integrating alignment-related data during merging, resulting in models that excel in both domain expertise and alignment.

Summary

AI-Generated Summary

PDF311December 2, 2024