通过共享注意力实现风格对齐图像生成

摘要

大规模文本到图像（T2I）模型在创意领域迅速崭露头角，能够从文本提示中生成视觉上引人注目的输出。然而，控制这些模型以确保一致的风格仍然具有挑战性，现有方法需要微调和手动干预以区分内容和风格。在本文中，我们介绍了StyleAligned，一种旨在在一系列生成的图像中建立风格对齐的新颖技术。通过在扩散过程中采用最小的“注意力共享”，我们的方法在T2I模型中保持图像之间的风格一致性。这种方法允许使用参考风格通过简单的反演操作创建风格一致的图像。我们的方法在不同风格和文本提示上的评估表明，具有高质量的合成和保真度，突显了其在实现各种输入的一致风格方面的有效性。

English

Large-scale Text-to-Image (T2I) models have rapidly gained prominence across creative fields, generating visually compelling outputs from textual prompts. However, controlling these models to ensure consistent style remains challenging, with existing methods necessitating fine-tuning and manual intervention to disentangle content and style. In this paper, we introduce StyleAligned, a novel technique designed to establish style alignment among a series of generated images. By employing minimal `attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models. This approach allows for the creation of style-consistent images using a reference style through a straightforward inversion operation. Our method's evaluation across diverse styles and text prompts demonstrates high-quality synthesis and fidelity, underscoring its efficacy in achieving consistent style across various inputs.

通过共享注意力实现风格对齐图像生成

Style Aligned Image Generation via Shared Attention

摘要

Support