透過共享注意力實現風格對齊影像生成

摘要

大規模文本到圖像（T2I）模型在創意領域迅速備受矚目，能夠從文本提示中生成引人入勝的視覺輸出。然而，控制這些模型以確保一致的風格仍然具有挑戰性，現有方法需要進行微調和手動干預以區分內容和風格。本文介紹了一種名為StyleAligned的新技術，旨在在一系列生成的圖像之間建立風格對齊。通過在擴散過程中採用最小的“注意力共享”，我們的方法在T2I模型中保持圖像之間的風格一致性。這種方法允許使用參考風格通過簡單的反演操作創建風格一致的圖像。我們的方法在不同風格和文本提示上的評估表明，具有高質量的合成和忠實度，突顯了其在實現各種輸入下一致風格方面的有效性。

English

Large-scale Text-to-Image (T2I) models have rapidly gained prominence across creative fields, generating visually compelling outputs from textual prompts. However, controlling these models to ensure consistent style remains challenging, with existing methods necessitating fine-tuning and manual intervention to disentangle content and style. In this paper, we introduce StyleAligned, a novel technique designed to establish style alignment among a series of generated images. By employing minimal `attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models. This approach allows for the creation of style-consistent images using a reference style through a straightforward inversion operation. Our method's evaluation across diverse styles and text prompts demonstrates high-quality synthesis and fidelity, underscoring its efficacy in achieving consistent style across various inputs.

透過共享注意力實現風格對齊影像生成

Style Aligned Image Generation via Shared Attention

摘要

Support