T-LoRA: 過学習なしで単一画像の拡散モデルをカスタマイズ

要旨

拡散モデルのファインチューニングは、事前学習済みモデルを特定のオブジェクト生成にカスタマイズする強力なアプローチを提供しますが、トレーニングサンプルが限られている場合、過学習に陥りやすく、汎化能力と出力の多様性が損なわれることが頻繁にあります。本論文は、単一の概念画像を使用して拡散モデルを適応させるという、最も実用的な可能性を秘めた挑戦的かつ影響力の大きいタスクに取り組みます。我々は、拡散モデルのパーソナライゼーションに特化したTimestep-Dependent Low-Rank Adaptation（T-LoRA）フレームワークを提案します。本研究では、高い拡散タイムステップは低いタイムステップよりも過学習しやすいことを示し、タイムステップに敏感なファインチューニング戦略の必要性を明らかにします。T-LoRAは、2つの主要な革新を組み込んでいます：(1) 拡散タイムステップに基づいてランク制約付き更新を調整する動的ファインチューニング戦略、(2) 直交初期化を通じてアダプタコンポーネント間の独立性を保証する重みパラメータ化技術。大規模な実験により、T-LoRAとその個々のコンポーネントが、標準的なLoRAや他の拡散モデルパーソナライゼーション技術を凌駕し、概念の忠実度とテキストの整合性の優れたバランスを達成することが示されました。これは、データが限られておりリソースが制約されたシナリオにおけるT-LoRAの可能性を強調しています。コードはhttps://github.com/ControlGenAI/T-LoRAで公開されています。

English

While diffusion model fine-tuning offers a powerful approach for customizing pre-trained models to generate specific objects, it frequently suffers from overfitting when training samples are limited, compromising both generalization capability and output diversity. This paper tackles the challenging yet most impactful task of adapting a diffusion model using just a single concept image, as single-image customization holds the greatest practical potential. We introduce T-LoRA, a Timestep-Dependent Low-Rank Adaptation framework specifically designed for diffusion model personalization. In our work we show that higher diffusion timesteps are more prone to overfitting than lower ones, necessitating a timestep-sensitive fine-tuning strategy. T-LoRA incorporates two key innovations: (1) a dynamic fine-tuning strategy that adjusts rank-constrained updates based on diffusion timesteps, and (2) a weight parametrization technique that ensures independence between adapter components through orthogonal initialization. Extensive experiments show that T-LoRA and its individual components outperform standard LoRA and other diffusion model personalization techniques. They achieve a superior balance between concept fidelity and text alignment, highlighting the potential of T-LoRA in data-limited and resource-constrained scenarios. Code is available at https://github.com/ControlGenAI/T-LoRA.

T-LoRA: 過学習なしで単一画像の拡散モデルをカスタマイズ

T-LoRA: Single Image Diffusion Model Customization Without Overfitting

要旨

Support