ChatPaper.aiChatPaper

模型合併配方的演化優化

Evolutionary Optimization of Model Merging Recipes

March 19, 2024
作者: Takuya Akiba, Makoto Shing, Yujin Tang, Qi Sun, David Ha
cs.AI

摘要

我們提出了一種新穎的演化算法應用,用於自動化強大基礎模型的創建。儘管模型合併已成為LLM開發的一種有前途的方法,因其具有成本效益,但目前仍依賴人類直覺和領域知識,限制了其潛力。在這裡,我們提出了一種演化方法,通過自動發現多樣開源模型的有效組合,克服了這一限制,利用它們的集體智慧,而無需大量額外的訓練數據或計算。我們的方法在參數空間和數據流空間中運作,允許進行超出單個模型權重的優化。這種方法甚至促進跨領域合併,生成具有數學推理能力的日本LLM等模型。令人驚訝的是,我們的日本數學LLM在各種既有的日本LLM基準測試中取得了最先進的性能,甚至超過了具有更多參數的模型,儘管並未明確為此類任務進行訓練。此外,通過我們的方法生成的具有文化意識的日本VLM展示了其在描述日本文化特定內容方面的有效性,優於先前的日本VLM。這項工作不僅將最新的模型貢獻給開源社區,還引入了一種新的自動化模型組合範式,為探索基礎模型開發的替代高效方法鋪平了道路。
English
We present a novel application of evolutionary algorithms to automate the creation of powerful foundation models. While model merging has emerged as a promising approach for LLM development due to its cost-effectiveness, it currently relies on human intuition and domain knowledge, limiting its potential. Here, we propose an evolutionary approach that overcomes this limitation by automatically discovering effective combinations of diverse open-source models, harnessing their collective intelligence without requiring extensive additional training data or compute. Our approach operates in both parameter space and data flow space, allowing for optimization beyond just the weights of the individual models. This approach even facilitates cross-domain merging, generating models like a Japanese LLM with Math reasoning capabilities. Surprisingly, our Japanese Math LLM achieved state-of-the-art performance on a variety of established Japanese LLM benchmarks, even surpassing models with significantly more parameters, despite not being explicitly trained for such tasks. Furthermore, a culturally-aware Japanese VLM generated through our approach demonstrates its effectiveness in describing Japanese culture-specific content, outperforming previous Japanese VLMs. This work not only contributes new state-of-the-art models back to the open-source community, but also introduces a new paradigm for automated model composition, paving the way for exploring alternative, efficient approaches to foundation model development.

Summary

AI-Generated Summary

PDF544December 15, 2024