ChatPaper.aiChatPaper

帧内帧外:无界可控的图像到视频生成

Frame In-N-Out: Unbounded Controllable Image-to-Video Generation

May 27, 2025
作者: Boyang Wang, Xuweiyi Chen, Matheus Gadelha, Zezhou Cheng
cs.AI

摘要

可控性、时间连贯性与细节合成仍是视频生成领域最为关键的挑战。本文聚焦于一种常用却未充分探索的影视技法——画面出入(Frame In and Frame Out)。具体而言,从图像到视频的生成出发,用户能够依据指定的运动轨迹,控制图像中的物体自然离开场景,或引入全新的身份参考使其进入场景。为支持这一任务,我们引入了一个半自动构建的新数据集、一套针对此场景的全面评估协议,以及一种高效的身份保持且运动可控的视频扩散Transformer架构。评估结果表明,我们提出的方法显著超越了现有基线模型。
English
Controllability, temporal coherence, and detail synthesis remain the most critical challenges in video generation. In this paper, we focus on a commonly used yet underexplored cinematic technique known as Frame In and Frame Out. Specifically, starting from image-to-video generation, users can control the objects in the image to naturally leave the scene or provide breaking new identity references to enter the scene, guided by user-specified motion trajectory. To support this task, we introduce a new dataset curated semi-automatically, a comprehensive evaluation protocol targeting this setting, and an efficient identity-preserving motion-controllable video Diffusion Transformer architecture. Our evaluation shows that our proposed approach significantly outperforms existing baselines.

Summary

AI-Generated Summary

PDF122May 28, 2025