故事适配器:一种无需训练的长篇故事可视化迭代框架
Story-Adapter: A Training-free Iterative Framework for Long Story Visualization
October 8, 2024
作者: Jiawei Mao, Xiaoke Huang, Yunfei Xie, Yuanqi Chang, Mude Hui, Bingjie Xu, Yuyin Zhou
cs.AI
摘要
故事可视化是根据叙事生成连贯图像的任务,在文本到图像模型特别是扩散模型的出现下取得了显著进展。然而,在长篇故事可视化(即长达100帧)中,保持语义一致性、生成高质量细粒度交互以及确保计算可行性仍然具有挑战性。在这项工作中,我们提出了一个无需训练且计算高效的框架,称为Story-Adapter,以增强长篇故事的生成能力。具体而言,我们提出了一种迭代范式来完善每个生成的图像,利用文本提示和前一次迭代中的所有生成图像。我们框架的核心是一个无需训练的全局参考交叉注意力模块,它汇聚了前一次迭代中的所有生成图像,以保持整个故事的语义一致性,同时通过全局嵌入降低计算成本。这种迭代过程通过反复整合文本约束逐渐优化图像生成,从而实现更精确和细致的交互。大量实验证实了Story-Adapter在提高语义一致性和生成能力方面的优越性,特别是在长篇故事情景中的细粒度交互。项目页面和相关代码可通过https://jwmao1.github.io/storyadapter访问。
English
Story visualization, the task of generating coherent images based on a
narrative, has seen significant advancements with the emergence of
text-to-image models, particularly diffusion models. However, maintaining
semantic consistency, generating high-quality fine-grained interactions, and
ensuring computational feasibility remain challenging, especially in long story
visualization (i.e., up to 100 frames). In this work, we propose a
training-free and computationally efficient framework, termed Story-Adapter, to
enhance the generative capability of long stories. Specifically, we propose an
iterative paradigm to refine each generated image, leveraging both the text
prompt and all generated images from the previous iteration. Central to our
framework is a training-free global reference cross-attention module, which
aggregates all generated images from the previous iteration to preserve
semantic consistency across the entire story, while minimizing computational
costs with global embeddings. This iterative process progressively optimizes
image generation by repeatedly incorporating text constraints, resulting in
more precise and fine-grained interactions. Extensive experiments validate the
superiority of Story-Adapter in improving both semantic consistency and
generative capability for fine-grained interactions, particularly in long story
scenarios. The project page and associated code can be accessed via
https://jwmao1.github.io/storyadapter .Summary
AI-Generated Summary