DMM: 蒸留ベースのモデル統合による汎用画像生成モデルの構築

要旨

テキストから画像（T2I）生成モデルの成功により、同じベースモデルから様々な専門データセットでファインチューニングされた多数のモデルチェックポイントが急増しています。この過剰な専門モデルの生産は、高いパラメータ冗長性と巨大なストレージコストという新たな課題を引き起こし、多様な強力なモデルの能力を単一のモデルに統合・統一する効果的な手法の開発を必要としています。モデルマージの一般的な手法では、スタイルの混合を達成するためにパラメータ空間での静的な線形補間が採用されています。しかし、この手法はT2I生成タスクの特徴、すなわち多数の異なるモデルが多様なスタイルをカバーしているため、マージされたモデルにおいて互換性の欠如や混乱を引き起こす可能性があることを無視しています。この問題に対処するため、我々はスタイルベクトルの制御下で任意のスタイルの画像を正確に生成できるスタイルプロンプト可能な画像生成パイプラインを導入します。この設計に基づき、複数のモデルを単一の汎用T2Iモデルに圧縮するスコア蒸留ベースのモデルマージングパラダイム（DMM）を提案します。さらに、T2I生成の文脈においてモデルマージングタスクを再考し、新たなマージング目標と評価プロトコルを提示します。我々の実験により、DMMが複数の教師モデルからの知識をコンパクトに再編成し、制御可能な任意スタイル生成を実現できることが実証されました。

English

The success of text-to-image (T2I) generation models has spurred a proliferation of numerous model checkpoints fine-tuned from the same base model on various specialized datasets. This overwhelming specialized model production introduces new challenges for high parameter redundancy and huge storage cost, thereby necessitating the development of effective methods to consolidate and unify the capabilities of diverse powerful models into a single one. A common practice in model merging adopts static linear interpolation in the parameter space to achieve the goal of style mixing. However, it neglects the features of T2I generation task that numerous distinct models cover sundry styles which may lead to incompatibility and confusion in the merged model. To address this issue, we introduce a style-promptable image generation pipeline which can accurately generate arbitrary-style images under the control of style vectors. Based on this design, we propose the score distillation based model merging paradigm (DMM), compressing multiple models into a single versatile T2I model. Moreover, we rethink and reformulate the model merging task in the context of T2I generation, by presenting new merging goals and evaluation protocols. Our experiments demonstrate that DMM can compactly reorganize the knowledge from multiple teacher models and achieve controllable arbitrary-style generation.

DMM: 蒸留ベースのモデル統合による汎用画像生成モデルの構築

DMM: Building a Versatile Image Generation Model via Distillation-Based Model Merging

要旨

Support