審核與修復:文本到圖像擴散模型中一致性故事視覺化的代理框架
Audit & Repair: An Agentic Framework for Consistent Story Visualization in Text-to-Image Diffusion Models
June 23, 2025
作者: Kiymet Akdemir, Tahira Kazimi, Pinar Yanardag
cs.AI
摘要
故事視覺化已成為一項熱門任務,旨在生成多格畫面來描繪敘事情節。在此情境下,核心挑戰在於維持視覺一致性,尤其是角色與物體在整個故事中的持續性與演變。儘管擴散模型近期取得了進展,現有方法往往無法保留關鍵角色特徵,導致敘事不連貫。在本研究中,我們提出了一個協作式多代理框架,能夠自主識別、修正並精煉多格故事視覺化中的不一致性。這些代理在一個迭代循環中運作,實現了細粒度的、單格層面的更新,而無需重新生成整個序列。我們的框架與模型無關,可靈活整合多種擴散模型,包括如Flux的整流流變換器及如Stable Diffusion的潛在擴散模型。定量與定性實驗表明,我們的方法在多格一致性方面超越了先前的方法。
English
Story visualization has become a popular task where visual scenes are
generated to depict a narrative across multiple panels. A central challenge in
this setting is maintaining visual consistency, particularly in how characters
and objects persist and evolve throughout the story. Despite recent advances in
diffusion models, current approaches often fail to preserve key character
attributes, leading to incoherent narratives. In this work, we propose a
collaborative multi-agent framework that autonomously identifies, corrects, and
refines inconsistencies across multi-panel story visualizations. The agents
operate in an iterative loop, enabling fine-grained, panel-level updates
without re-generating entire sequences. Our framework is model-agnostic and
flexibly integrates with a variety of diffusion models, including rectified
flow transformers such as Flux and latent diffusion models such as Stable
Diffusion. Quantitative and qualitative experiments show that our method
outperforms prior approaches in terms of multi-panel consistency.