ChatPaper.aiChatPaper

AdaMMS:基於無監督係數優化的異質多模態大型語言模型合併

AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization

March 31, 2025
作者: Yiyang Du, Xiaochen Wang, Chi Chen, Jiabo Ye, Yiru Wang, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Zhifang Sui, Maosong Sun, Yang Liu
cs.AI

摘要

近期,模型融合方法在結合多個大型語言模型(LLMs)於各項任務上的能力展現了強大的優勢。然而,以往的模型融合方法主要專注於融合具有相同架構的同質模型,在處理具有內在異質特性的多模態大型語言模型(MLLMs)時面臨挑戰,這些挑戰包括模型架構的差異以及參數空間的不對稱性。在本研究中,我們提出了AdaMMS,一種專為異質MLLMs設計的新穎模型融合方法。我們的方法通過三個步驟來應對這些挑戰:映射、融合和搜索。具體而言,我們首先設計了模型之間的映射函數,以便在不同架構的MLLMs上應用模型融合。接著,我們對模型權重進行線性插值,以主動適應異質MLLMs中的不對稱性。最後,在超參數搜索步驟中,我們提出了一種無監督的超參數選擇方法用於模型融合。作為首個能夠在無標籤數據情況下融合異質MLLMs的模型融合方法,大量實驗結果表明,AdaMMS在多種視覺-語言基準測試上均優於以往的模型融合方法。
English
Recently, model merging methods have demonstrated powerful strengths in combining abilities on various tasks from multiple Large Language Models (LLMs). While previous model merging methods mainly focus on merging homogeneous models with identical architecture, they meet challenges when dealing with Multimodal Large Language Models (MLLMs) with inherent heterogeneous property, including differences in model architecture and the asymmetry in the parameter space. In this work, we propose AdaMMS, a novel model merging method tailored for heterogeneous MLLMs. Our method tackles the challenges in three steps: mapping, merging and searching. Specifically, we first design mapping function between models to apply model merging on MLLMs with different architecture. Then we apply linear interpolation on model weights to actively adapt the asymmetry in the heterogeneous MLLMs. Finally in the hyper-parameter searching step, we propose an unsupervised hyper-parameter selection method for model merging. As the first model merging method capable of merging heterogeneous MLLMs without labeled data, extensive experiments on various model combinations demonstrated that AdaMMS outperforms previous model merging methods on various vision-language benchmarks.

Summary

AI-Generated Summary

PDF113April 2, 2025