ViStoryBench：ストーリー可視化のための包括的ベンチマークスイート

要旨

ストーリービジュアライゼーションは、与えられた物語と参照画像に沿った視覚的に一貫性のある画像シーケンスを生成することを目的としており、最近の生成モデルの進歩により大きな進展を遂げています。現実世界のシナリオにおけるストーリービジュアライゼーションフレームワークの性能をさらに向上させるため、我々は包括的な評価ベンチマーク「ViStoryBench」を導入します。我々は、さまざまなストーリータイプと芸術的スタイルを網羅した多様なデータセットを収集し、モデルが異なるプロット（例：コメディ、ホラー）や視覚的美学（例：アニメ、3Dレンダリング）といった複数の次元で評価されることを保証します。ViStoryBenchは、物語構造と視覚的要素のバランスを慎重に考慮しており、単一および複数の主人公を持つストーリーを特徴とし、モデルのキャラクター一貫性を維持する能力をテストします。さらに、複雑なプロットと緻密な世界構築を含めることで、モデルが正確な視覚的表現を生成する能力に挑戦します。包括的な比較を確保するため、我々のベンチマークは重要な側面を評価する幅広い評価指標を組み込んでいます。この構造化され多面的なフレームワークにより、研究者は異なるモデルの強みと弱みを徹底的に特定し、ターゲットを絞った改善を促進することができます。

English

Story visualization, which aims to generate a sequence of visually coherent images aligning with a given narrative and reference images, has seen significant progress with recent advancements in generative models. To further enhance the performance of story visualization frameworks in real-world scenarios, we introduce a comprehensive evaluation benchmark, ViStoryBench. We collect a diverse dataset encompassing various story types and artistic styles, ensuring models are evaluated across multiple dimensions such as different plots (e.g., comedy, horror) and visual aesthetics (e.g., anime, 3D renderings). ViStoryBench is carefully curated to balance narrative structures and visual elements, featuring stories with single and multiple protagonists to test models' ability to maintain character consistency. Additionally, it includes complex plots and intricate world-building to challenge models in generating accurate visuals. To ensure comprehensive comparisons, our benchmark incorporates a wide range of evaluation metrics assessing critical aspects. This structured and multifaceted framework enables researchers to thoroughly identify both the strengths and weaknesses of different models, fostering targeted improvements.