Story2Board:一種無需訓練的表現力故事板生成方法
Story2Board: A Training-Free Approach for Expressive Storyboard Generation
August 13, 2025
作者: David Dinkevich, Matan Levy, Omri Avrahami, Dvir Samuel, Dani Lischinski
cs.AI
摘要
我們提出了Story2Board,這是一個無需訓練的框架,用於從自然語言生成富有表現力的故事板。現有方法過於專注於主體身份,而忽略了視覺敘事的關鍵方面,如空間構圖、背景演變和敘事節奏。為解決這一問題,我們引入了一個輕量級的一致性框架,該框架由兩個組件組成:潛在面板錨定(Latent Panel Anchoring),用於在面板間保持共享的角色參考;以及互惠注意力值混合(Reciprocal Attention Value Mixing),它通過強互惠注意力軟性融合視覺特徵對。這些機制共同增強了連貫性,無需架構更改或微調,使最先進的擴散模型能夠生成視覺多樣且一致的故事板。為了結構化生成,我們使用現成的語言模型將自由形式的故事轉換為具體的面板級提示。為了評估,我們提出了豐富故事板基準(Rich Storyboard Benchmark),這是一套開放域敘事,旨在評估佈局多樣性和基於背景的敘事,以及一致性。我們還引入了一種新的場景多樣性指標,用於量化故事板間的空間和姿勢變化。我們的定性和定量結果,以及用戶研究,表明Story2Board生成的故事情節比現有基線更動態、連貫且敘事引人入勝。
English
We present Story2Board, a training-free framework for expressive storyboard
generation from natural language. Existing methods narrowly focus on subject
identity, overlooking key aspects of visual storytelling such as spatial
composition, background evolution, and narrative pacing. To address this, we
introduce a lightweight consistency framework composed of two components:
Latent Panel Anchoring, which preserves a shared character reference across
panels, and Reciprocal Attention Value Mixing, which softly blends visual
features between token pairs with strong reciprocal attention. Together, these
mechanisms enhance coherence without architectural changes or fine-tuning,
enabling state-of-the-art diffusion models to generate visually diverse yet
consistent storyboards. To structure generation, we use an off-the-shelf
language model to convert free-form stories into grounded panel-level prompts.
To evaluate, we propose the Rich Storyboard Benchmark, a suite of open-domain
narratives designed to assess layout diversity and background-grounded
storytelling, in addition to consistency. We also introduce a new Scene
Diversity metric that quantifies spatial and pose variation across storyboards.
Our qualitative and quantitative results, as well as a user study, show that
Story2Board produces more dynamic, coherent, and narratively engaging
storyboards than existing baselines.