ViStoryBench：故事視覺化的全面基準測試套件

摘要

故事視覺化旨在生成一系列與給定敘事和參考圖像相符的視覺連貫圖像，隨著生成模型的最新進展，該領域已取得顯著進步。為了進一步提升故事視覺化框架在現實場景中的表現，我們引入了一個全面的評估基準——ViStoryBench。我們收集了一個涵蓋多種故事類型和藝術風格的多元化數據集，確保模型能在不同情節（如喜劇、恐怖）和視覺美學（如動漫、3D渲染）等多個維度上進行評估。ViStoryBench精心策劃，以平衡敘事結構和視覺元素，包含單一和多主角的故事，以測試模型在保持角色一致性方面的能力。此外，它還包含複雜的情節和精細的世界構建，挑戰模型生成準確視覺效果的能力。為了確保全面的比較，我們的基準整合了多種評估指標，涵蓋關鍵方面。這一結構化且多層次的框架使研究人員能夠深入識別不同模型的優勢和弱點，促進有針對性的改進。

English

Story visualization, which aims to generate a sequence of visually coherent images aligning with a given narrative and reference images, has seen significant progress with recent advancements in generative models. To further enhance the performance of story visualization frameworks in real-world scenarios, we introduce a comprehensive evaluation benchmark, ViStoryBench. We collect a diverse dataset encompassing various story types and artistic styles, ensuring models are evaluated across multiple dimensions such as different plots (e.g., comedy, horror) and visual aesthetics (e.g., anime, 3D renderings). ViStoryBench is carefully curated to balance narrative structures and visual elements, featuring stories with single and multiple protagonists to test models' ability to maintain character consistency. Additionally, it includes complex plots and intricate world-building to challenge models in generating accurate visuals. To ensure comprehensive comparisons, our benchmark incorporates a wide range of evaluation metrics assessing critical aspects. This structured and multifaceted framework enables researchers to thoroughly identify both the strengths and weaknesses of different models, fostering targeted improvements.