ChatPaper.aiChatPaper

LongVie:多模态引导的可控超长视频生成

LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation

August 5, 2025
作者: Jianxiong Gao, Zhaoxi Chen, Xian Liu, Jianfeng Feng, Chenyang Si, Yanwei Fu, Yu Qiao, Ziwei Liu
cs.AI

摘要

可控超长视频生成是一项基础且极具挑战性的任务。尽管现有方法在短片段生成上表现优异,但由于时间不一致性和视觉质量退化等问题,它们难以扩展到更长视频。本文首先研究并识别了三个关键因素:独立的噪声初始化、分离的控制信号归一化以及单模态引导的局限性。为解决这些问题,我们提出了LongVie,一个端到端的自回归框架,用于可控长视频生成。LongVie引入了两项核心设计以确保时间一致性:1)统一的噪声初始化策略,保持跨片段生成的一致性;2)全局控制信号归一化,确保整个视频控制空间的对齐。为缓解视觉质量退化,LongVie采用了3)多模态控制框架,整合了密集(如深度图)和稀疏(如关键点)控制信号,并辅以4)退化感知训练策略,自适应地平衡各模态随时间变化的贡献,以保持视觉质量。我们还推出了LongVGenBench,一个包含100个高分辨率视频的全面基准测试集,涵盖多样化的真实世界和合成环境,每个视频时长均超过一分钟。大量实验表明,LongVie在长程可控性、一致性和质量方面均达到了业界领先水平。
English
Controllable ultra-long video generation is a fundamental yet challenging task. Although existing methods are effective for short clips, they struggle to scale due to issues such as temporal inconsistency and visual degradation. In this paper, we initially investigate and identify three key factors: separate noise initialization, independent control signal normalization, and the limitations of single-modality guidance. To address these issues, we propose LongVie, an end-to-end autoregressive framework for controllable long video generation. LongVie introduces two core designs to ensure temporal consistency: 1) a unified noise initialization strategy that maintains consistent generation across clips, and 2) global control signal normalization that enforces alignment in the control space throughout the entire video. To mitigate visual degradation, LongVie employs 3) a multi-modal control framework that integrates both dense (e.g., depth maps) and sparse (e.g., keypoints) control signals, complemented by 4) a degradation-aware training strategy that adaptively balances modality contributions over time to preserve visual quality. We also introduce LongVGenBench, a comprehensive benchmark consisting of 100 high-resolution videos spanning diverse real-world and synthetic environments, each lasting over one minute. Extensive experiments show that LongVie achieves state-of-the-art performance in long-range controllability, consistency, and quality.
PDF463August 6, 2025