ChatPaper.aiChatPaper

框架內外:無界可控的圖像到視頻生成

Frame In-N-Out: Unbounded Controllable Image-to-Video Generation

May 27, 2025
作者: Boyang Wang, Xuweiyi Chen, Matheus Gadelha, Zezhou Cheng
cs.AI

摘要

可控性、時間一致性與細節合成仍是視訊生成中最為關鍵的挑戰。本文聚焦於一種常用卻未充分探索的電影技術——「入鏡與出鏡」。具體而言,從圖像到視訊的生成出發,使用者能夠控制圖像中的物體自然離開場景,或根據使用者指定的運動軌跡引入全新的身份參考進入場景。為支援此任務,我們引入了一個半自動策劃的新數據集、針對此情境的全面評估協議,以及一個高效的身份保持運動可控視訊擴散變換器架構。評估結果表明,我們提出的方法顯著優於現有基線。
English
Controllability, temporal coherence, and detail synthesis remain the most critical challenges in video generation. In this paper, we focus on a commonly used yet underexplored cinematic technique known as Frame In and Frame Out. Specifically, starting from image-to-video generation, users can control the objects in the image to naturally leave the scene or provide breaking new identity references to enter the scene, guided by user-specified motion trajectory. To support this task, we introduce a new dataset curated semi-automatically, a comprehensive evaluation protocol targeting this setting, and an efficient identity-preserving motion-controllable video Diffusion Transformer architecture. Our evaluation shows that our proposed approach significantly outperforms existing baselines.

Summary

AI-Generated Summary

PDF152May 28, 2025