ChatPaper.aiChatPaper

TaleCrafter:具有多個角色的互動故事可視化

TaleCrafter: Interactive Story Visualization with Multiple Characters

May 29, 2023
作者: Yuan Gong, Youxin Pang, Xiaodong Cun, Menghan Xia, Haoxin Chen, Longyue Wang, Yong Zhang, Xintao Wang, Ying Shan, Yujiu Yang
cs.AI

摘要

準確的故事視覺化需要幾個必要元素,例如跨幀的身份一致性、純文本與視覺內容之間的對齊,以及圖像中物件的合理佈局。大多數先前的研究試圖通過在相同風格和相同角色的一組視頻上擬合文本到圖像(T2I)模型來滿足這些要求,例如FlintstonesSV數據集。然而,學習的T2I模型通常難以適應新的角色、場景和風格,並且常常缺乏修改合成圖像佈局的靈活性。本文提出了一個通用的互動式故事視覺化系統,能夠處理多個新角色並支持編輯佈局和局部結構。通過利用在龐大語料庫上訓練的大型語言和T2I模型的先前知識來開發該系統。該系統包括四個相互關聯的組件:故事到提示生成(S2P)、文本到佈局生成(T2L)、可控文本到圖像生成(C-T2I)和圖像到視頻動畫(I2V)。首先,S2P模塊將簡潔的故事信息轉換為後續階段所需的詳細提示。接下來,T2L基於提示生成多樣且合理的佈局,為用戶提供調整和優化佈局的能力。核心組件C-T2I使得在保持視覺化的一致性和細節的情況下,通過佈局、草圖和特定演員標識符引導創建圖像成為可能。最後,I2V通過為生成的圖像添加動畫豐富了視覺化過程。通過廣泛的實驗和用戶研究來驗證所提出系統的互動式編輯的有效性和靈活性。
English
Accurate Story visualization requires several necessary elements, such as identity consistency across frames, the alignment between plain text and visual content, and a reasonable layout of objects in images. Most previous works endeavor to meet these requirements by fitting a text-to-image (T2I) model on a set of videos in the same style and with the same characters, e.g., the FlintstonesSV dataset. However, the learned T2I models typically struggle to adapt to new characters, scenes, and styles, and often lack the flexibility to revise the layout of the synthesized images. This paper proposes a system for generic interactive story visualization, capable of handling multiple novel characters and supporting the editing of layout and local structure. It is developed by leveraging the prior knowledge of large language and T2I models, trained on massive corpora. The system comprises four interconnected components: story-to-prompt generation (S2P), text-to-layout generation (T2L), controllable text-to-image generation (C-T2I), and image-to-video animation (I2V). First, the S2P module converts concise story information into detailed prompts required for subsequent stages. Next, T2L generates diverse and reasonable layouts based on the prompts, offering users the ability to adjust and refine the layout to their preference. The core component, C-T2I, enables the creation of images guided by layouts, sketches, and actor-specific identifiers to maintain consistency and detail across visualizations. Finally, I2V enriches the visualization process by animating the generated images. Extensive experiments and a user study are conducted to validate the effectiveness and flexibility of interactive editing of the proposed system.
PDF40December 15, 2024