ConsisLoRA: LoRAベースのスタイル転送における内容とスタイルの一貫性の向上

要旨

スタイル転写は、参照画像のスタイルをターゲット画像の内容に転送することを含みます。最近のLoRAベース（Low-Rank Adaptation）手法の進展により、単一画像のスタイルを効果的に捉えることが可能になりました。しかし、これらのアプローチは依然として、内容の不一致、スタイルの不整合、および内容の漏洩といった重大な課題に直面しています。本論文では、ノイズを予測するように学習する標準的な拡散パラメータ化の限界を、スタイル転写の文脈で包括的に分析します。これらの問題に対処するため、LoRAの重みをノイズではなく元の画像を予測するように最適化することで、内容とスタイルの一貫性を向上させるConsisLoRAを提案します。また、参照画像からの内容とスタイルの学習を分離する二段階のトレーニング戦略を提案します。内容画像のグローバルな構造とローカルな詳細を効果的に捉えるために、段階的な損失遷移戦略を導入します。さらに、推論中に内容とスタイルの強度を連続的に制御できる推論ガイダンス手法を提示します。定性的および定量的な評価を通じて、本手法は内容とスタイルの一貫性を大幅に改善し、内容の漏洩を効果的に低減することを示します。

English

Style transfer involves transferring the style from a reference image to the content of a target image. Recent advancements in LoRA-based (Low-Rank Adaptation) methods have shown promise in effectively capturing the style of a single image. However, these approaches still face significant challenges such as content inconsistency, style misalignment, and content leakage. In this paper, we comprehensively analyze the limitations of the standard diffusion parameterization, which learns to predict noise, in the context of style transfer. To address these issues, we introduce ConsisLoRA, a LoRA-based method that enhances both content and style consistency by optimizing the LoRA weights to predict the original image rather than noise. We also propose a two-step training strategy that decouples the learning of content and style from the reference image. To effectively capture both the global structure and local details of the content image, we introduce a stepwise loss transition strategy. Additionally, we present an inference guidance method that enables continuous control over content and style strengths during inference. Through both qualitative and quantitative evaluations, our method demonstrates significant improvements in content and style consistency while effectively reducing content leakage.

ConsisLoRA: LoRAベースのスタイル転送における内容とスタイルの一貫性の向上

ConsisLoRA: Enhancing Content and Style Consistency for LoRA-based Style Transfer

要旨

Support