ChatPaper.aiChatPaper

長影:多模態引導可控超長視頻生成

LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation

August 5, 2025
作者: Jianxiong Gao, Zhaoxi Chen, Xian Liu, Jianfeng Feng, Chenyang Si, Yanwei Fu, Yu Qiao, Ziwei Liu
cs.AI

摘要

可控超長視頻生成是一項基礎但極具挑戰性的任務。儘管現有方法在短片段生成上表現有效,但由於時間不一致性和視覺退化等問題,它們難以擴展至更長視頻。本文首先探討並識別了三個關鍵因素:獨立的噪聲初始化、獨立控制信號歸一化以及單模態指導的局限性。為解決這些問題,我們提出了LongVie,這是一個端到端的自回歸框架,用於可控長視頻生成。LongVie引入了兩個核心設計以確保時間一致性:1)統一的噪聲初始化策略,確保跨片段生成的一致性;2)全局控制信號歸一化,強制整個視頻在控制空間中的對齊。為減輕視覺退化,LongVie採用了3)多模態控制框架,整合了密集(如深度圖)和稀疏(如關鍵點)控制信號,並輔以4)退化感知訓練策略,自適應地平衡模態貢獻以保持視覺質量。我們還引入了LongVGenBench,這是一個全面的基準測試,包含100個高分辨率視頻,涵蓋多樣的真實世界和合成環境,每個視頻持續超過一分鐘。大量實驗表明,LongVie在長程可控性、一致性和質量方面達到了最先進的性能。
English
Controllable ultra-long video generation is a fundamental yet challenging task. Although existing methods are effective for short clips, they struggle to scale due to issues such as temporal inconsistency and visual degradation. In this paper, we initially investigate and identify three key factors: separate noise initialization, independent control signal normalization, and the limitations of single-modality guidance. To address these issues, we propose LongVie, an end-to-end autoregressive framework for controllable long video generation. LongVie introduces two core designs to ensure temporal consistency: 1) a unified noise initialization strategy that maintains consistent generation across clips, and 2) global control signal normalization that enforces alignment in the control space throughout the entire video. To mitigate visual degradation, LongVie employs 3) a multi-modal control framework that integrates both dense (e.g., depth maps) and sparse (e.g., keypoints) control signals, complemented by 4) a degradation-aware training strategy that adaptively balances modality contributions over time to preserve visual quality. We also introduce LongVGenBench, a comprehensive benchmark consisting of 100 high-resolution videos spanning diverse real-world and synthetic environments, each lasting over one minute. Extensive experiments show that LongVie achieves state-of-the-art performance in long-range controllability, consistency, and quality.
PDF493August 6, 2025