ChatPaper.aiChatPaper

YingVideo-MV:音乐驱动的多阶段视频生成技术

YingVideo-MV: Music-Driven Multi-Stage Video Generation

December 2, 2025
作者: Jiahui Chen, Weida Wang, Runhua Shi, Huan Yang, Chaofan Ding, Zihao Chen
cs.AI

摘要

尽管音频驱动虚拟人视频生成的扩散模型在合成长序列时已实现自然音画同步与身份一致性的显著进展,但包含摄像机运动的音乐表演视频生成领域仍鲜有探索。我们提出YingVideo-MV——首个面向音乐驱动长视频生成的级联框架。该方法融合音频语义解析、可解释镜头规划模块(MV-Director)、时序感知扩散Transformer架构以及长序列一致性建模,实现了从音频信号自动合成高质量音乐表演视频。通过采集网络数据构建的大规模野外音乐数据集,为生成多样化高质量结果提供支撑。针对现有长视频生成方法缺乏显式摄像机运动控制的问题,我们引入摄像机适配器模块将摄像机位姿嵌入潜空间噪声。为增强长序列推理中片段间的连续性,进一步提出时序感知动态窗口范围策略,基于音频嵌入自适应调整去噪范围。综合基准测试表明,YingVideo-MV在生成连贯富有表现力的音乐视频方面表现卓越,并能实现精准的音乐-动作-摄像机同步。更多视频请访问项目页面:https://giantailab.github.io/YingVideo-MV/
English
While diffusion model for audio-driven avatar video generation have achieved notable process in synthesizing long sequences with natural audio-visual synchronization and identity consistency, the generation of music-performance videos with camera motions remains largely unexplored. We present YingVideo-MV, the first cascaded framework for music-driven long-video generation. Our approach integrates audio semantic analysis, an interpretable shot planning module (MV-Director), temporal-aware diffusion Transformer architectures, and long-sequence consistency modeling to enable automatic synthesis of high-quality music performance videos from audio signals. We construct a large-scale Music-in-the-Wild Dataset by collecting web data to support the achievement of diverse, high-quality results. Observing that existing long-video generation methods lack explicit camera motion control, we introduce a camera adapter module that embeds camera poses into latent noise. To enhance continulity between clips during long-sequence inference, we further propose a time-aware dynamic window range strategy that adaptively adjust denoising ranges based on audio embedding. Comprehensive benchmark tests demonstrate that YingVideo-MV achieves outstanding performance in generating coherent and expressive music videos, and enables precise music-motion-camera synchronization. More videos are available in our project page: https://giantailab.github.io/YingVideo-MV/ .
PDF31December 4, 2025