審核與修復：文本到圖像擴散模型中一致性故事視覺化的代理框架

摘要

故事視覺化已成為一項熱門任務，旨在生成多格畫面來描繪敘事情節。在此情境下，核心挑戰在於維持視覺一致性，尤其是角色與物體在整個故事中的持續性與演變。儘管擴散模型近期取得了進展，現有方法往往無法保留關鍵角色特徵，導致敘事不連貫。在本研究中，我們提出了一個協作式多代理框架，能夠自主識別、修正並精煉多格故事視覺化中的不一致性。這些代理在一個迭代循環中運作，實現了細粒度的、單格層面的更新，而無需重新生成整個序列。我們的框架與模型無關，可靈活整合多種擴散模型，包括如Flux的整流流變換器及如Stable Diffusion的潛在擴散模型。定量與定性實驗表明，我們的方法在多格一致性方面超越了先前的方法。

English

Story visualization has become a popular task where visual scenes are generated to depict a narrative across multiple panels. A central challenge in this setting is maintaining visual consistency, particularly in how characters and objects persist and evolve throughout the story. Despite recent advances in diffusion models, current approaches often fail to preserve key character attributes, leading to incoherent narratives. In this work, we propose a collaborative multi-agent framework that autonomously identifies, corrects, and refines inconsistencies across multi-panel story visualizations. The agents operate in an iterative loop, enabling fine-grained, panel-level updates without re-generating entire sequences. Our framework is model-agnostic and flexibly integrates with a variety of diffusion models, including rectified flow transformers such as Flux and latent diffusion models such as Stable Diffusion. Quantitative and qualitative experiments show that our method outperforms prior approaches in terms of multi-panel consistency.

審核與修復：文本到圖像擴散模型中一致性故事視覺化的代理框架

Audit & Repair: An Agentic Framework for Consistent Story Visualization in Text-to-Image Diffusion Models

摘要

Support