ChatPaper.aiChatPaper

审计与修复:一种确保文本到图像扩散模型中故事一致性的自主框架

Audit & Repair: An Agentic Framework for Consistent Story Visualization in Text-to-Image Diffusion Models

June 23, 2025
作者: Kiymet Akdemir, Tahira Kazimi, Pinar Yanardag
cs.AI

摘要

故事可视化已成为一项热门任务,其目标是通过多幅画面生成视觉场景来描绘叙事。在此情境下,核心挑战在于保持视觉一致性,尤其是角色和物体在故事中的持续存在与演变方式。尽管扩散模型近期取得了进展,现有方法往往难以保留关键角色特征,导致叙事不连贯。本研究中,我们提出了一种协作式多智能体框架,该框架能自主识别、修正并优化跨多幅故事可视化画面中的不一致性。这些智能体在迭代循环中运作,支持细粒度的、画面级别的更新,而无需重新生成整个序列。我们的框架具有模型无关性,可灵活整合多种扩散模型,包括如Flux这样的整流流变换器以及如Stable Diffusion这样的潜在扩散模型。定量与定性实验表明,我们的方法在多幅画面一致性方面超越了先前的方法。
English
Story visualization has become a popular task where visual scenes are generated to depict a narrative across multiple panels. A central challenge in this setting is maintaining visual consistency, particularly in how characters and objects persist and evolve throughout the story. Despite recent advances in diffusion models, current approaches often fail to preserve key character attributes, leading to incoherent narratives. In this work, we propose a collaborative multi-agent framework that autonomously identifies, corrects, and refines inconsistencies across multi-panel story visualizations. The agents operate in an iterative loop, enabling fine-grained, panel-level updates without re-generating entire sequences. Our framework is model-agnostic and flexibly integrates with a variety of diffusion models, including rectified flow transformers such as Flux and latent diffusion models such as Stable Diffusion. Quantitative and qualitative experiments show that our method outperforms prior approaches in terms of multi-panel consistency.
PDF21June 24, 2025