ConsisLoRA: Verbetering van inhouds- en stijlconsistentie voor LoRA-gebaseerde stijloverdracht

Samenvatting

Stijloverdracht houdt in dat de stijl van een referentiebeeld wordt overgedragen naar de inhoud van een doelbeeld. Recente vooruitgang in LoRA-gebaseerde (Low-Rank Adaptatie) methoden heeft veelbelovende resultaten laten zien in het effectief vastleggen van de stijl van een enkel beeld. Deze benaderingen kampen echter nog steeds met aanzienlijke uitdagingen, zoals inhoudsinconsistentie, stijlmisalignering en inhoudslekken. In dit artikel analyseren we uitgebreid de beperkingen van de standaard diffusie-parameterisatie, die leert om ruis te voorspellen, in de context van stijloverdracht. Om deze problemen aan te pakken, introduceren we ConsisLoRA, een LoRA-gebaseerde methode die zowel de inhouds- als stijlconsistentie verbetert door de LoRA-gewichten te optimaliseren om het originele beeld te voorspellen in plaats van ruis. We stellen ook een tweestaps trainingsstrategie voor die het leren van inhoud en stijl van het referentiebeeld ontkoppelt. Om zowel de globale structuur als de lokale details van het inhoudsbeeld effectief vast te leggen, introduceren we een stapsgewijze verliesovergangsstrategie. Daarnaast presenteren we een inferentiebegeleidingsmethode die continue controle over de sterkte van inhoud en stijl mogelijk maakt tijdens de inferentie. Door zowel kwalitatieve als kwantitatieve evaluaties toont onze methode aanzienlijke verbeteringen in inhouds- en stijlconsistentie aan, terwijl inhoudslekken effectief worden verminderd.

English

Style transfer involves transferring the style from a reference image to the content of a target image. Recent advancements in LoRA-based (Low-Rank Adaptation) methods have shown promise in effectively capturing the style of a single image. However, these approaches still face significant challenges such as content inconsistency, style misalignment, and content leakage. In this paper, we comprehensively analyze the limitations of the standard diffusion parameterization, which learns to predict noise, in the context of style transfer. To address these issues, we introduce ConsisLoRA, a LoRA-based method that enhances both content and style consistency by optimizing the LoRA weights to predict the original image rather than noise. We also propose a two-step training strategy that decouples the learning of content and style from the reference image. To effectively capture both the global structure and local details of the content image, we introduce a stepwise loss transition strategy. Additionally, we present an inference guidance method that enables continuous control over content and style strengths during inference. Through both qualitative and quantitative evaluations, our method demonstrates significant improvements in content and style consistency while effectively reducing content leakage.

ConsisLoRA: Verbetering van inhouds- en stijlconsistentie voor LoRA-gebaseerde stijloverdracht

ConsisLoRA: Enhancing Content and Style Consistency for LoRA-based Style Transfer

Samenvatting

Support