ChatPaper.aiChatPaper

Story2Board:一种无需训练的富有表现力的故事板生成方法

Story2Board: A Training-Free Approach for Expressive Storyboard Generation

August 13, 2025
作者: David Dinkevich, Matan Levy, Omri Avrahami, Dvir Samuel, Dani Lischinski
cs.AI

摘要

我们提出了Story2Board,一个无需训练的框架,用于从自然语言生成富有表现力的故事板。现有方法过于关注主体身份,忽视了视觉叙事中的关键要素,如空间构图、背景演变和叙事节奏。为解决这一问题,我们引入了一个轻量级的一致性框架,包含两个组件:潜在面板锚定(Latent Panel Anchoring),用于在多个面板间保持共享的角色参考;以及互注意力值混合(Reciprocal Attention Value Mixing),通过软融合具有强互注意力的标记对之间的视觉特征。这些机制共同增强了连贯性,无需架构修改或微调,使最先进的扩散模型能够生成视觉多样且一致的故事板。为结构化生成,我们使用现成的语言模型将自由形式的故事转换为基于面板的提示。为评估效果,我们提出了丰富故事板基准(Rich Storyboard Benchmark),一套开放域叙事集,旨在评估布局多样性和基于背景的叙事能力,同时兼顾一致性。我们还引入了一个新的场景多样性指标,量化故事板间的空间和姿态变化。我们的定性和定量结果,以及用户研究表明,Story2Board生成的故事板比现有基线更具动态性、连贯性和叙事吸引力。
English
We present Story2Board, a training-free framework for expressive storyboard generation from natural language. Existing methods narrowly focus on subject identity, overlooking key aspects of visual storytelling such as spatial composition, background evolution, and narrative pacing. To address this, we introduce a lightweight consistency framework composed of two components: Latent Panel Anchoring, which preserves a shared character reference across panels, and Reciprocal Attention Value Mixing, which softly blends visual features between token pairs with strong reciprocal attention. Together, these mechanisms enhance coherence without architectural changes or fine-tuning, enabling state-of-the-art diffusion models to generate visually diverse yet consistent storyboards. To structure generation, we use an off-the-shelf language model to convert free-form stories into grounded panel-level prompts. To evaluate, we propose the Rich Storyboard Benchmark, a suite of open-domain narratives designed to assess layout diversity and background-grounded storytelling, in addition to consistency. We also introduce a new Scene Diversity metric that quantifies spatial and pose variation across storyboards. Our qualitative and quantitative results, as well as a user study, show that Story2Board produces more dynamic, coherent, and narratively engaging storyboards than existing baselines.
PDF422August 14, 2025