ChatPaper.aiChatPaper

DualCamCtrl:面向几何感知相机控制视频生成的双分支扩散模型

DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation

November 28, 2025
作者: Hongfei Zhang, Kanghao Chen, Zixin Zhang, Harold Haodong Chen, Yuanhuiyi Lyu, Yuqi Zhang, Shuai Yang, Kun Zhou, Yingcong Chen
cs.AI

摘要

本文提出DualCamCtrl——一种用于相机控制视频生成的新型端到端扩散模型。现有研究通过将相机位姿表示为射线条件推动了该领域发展,但往往缺乏充分的场景理解与几何感知能力。DualCamCtrl针对这一局限,设计了双分支框架来协同生成相机一致的RGB序列与深度序列。为协调这两种模态,我们进一步提出语义引导互对齐机制(SIGMA),以语义引导和相互增强的方式实现RGB-深度融合。这些设计共同助力DualCamCtrl更好解耦外观与几何建模,生成更精准遵循指定相机轨迹的视频。此外,我们分析揭示了深度与相机位姿在去噪各阶段的差异化影响,并论证了早期与晚期阶段在构建全局结构和优化局部细节方面的互补作用。大量实验表明,DualCamCtrl实现了更一致的相机控制视频生成,相机运动误差较现有方法降低超40%。项目页面:https://soyouthinkyoucantell.github.io/dualcamctrl-page/
English
This paper presents DualCamCtrl, a novel end-to-end diffusion model for camera-controlled video generation. Recent works have advanced this field by representing camera poses as ray-based conditions, yet they often lack sufficient scene understanding and geometric awareness. DualCamCtrl specifically targets this limitation by introducing a dual-branch framework that mutually generates camera-consistent RGB and depth sequences. To harmonize these two modalities, we further propose the Semantic Guided Mutual Alignment (SIGMA) mechanism, which performs RGB-depth fusion in a semantics-guided and mutually reinforced manner. These designs collectively enable DualCamCtrl to better disentangle appearance and geometry modeling, generating videos that more faithfully adhere to the specified camera trajectories. Additionally, we analyze and reveal the distinct influence of depth and camera poses across denoising stages and further demonstrate that early and late stages play complementary roles in forming global structure and refining local details. Extensive experiments demonstrate that DualCamCtrl achieves more consistent camera-controlled video generation, with over 40\% reduction in camera motion errors compared with prior methods. Our project page: https://soyouthinkyoucantell.github.io/dualcamctrl-page/
PDF381December 4, 2025