ReCamDriving:无激光雷达的相机控制新型轨迹视频生成
ReCamDriving: LiDAR-Free Camera-Controlled Novel Trajectory Video Generation
December 3, 2025
作者: Yaokun Li, Shuaixian Wang, Mantang Guo, Jiehui Huang, Taojun Ding, Mu Hu, Kaixuan Wang, Shaojie Shen, Guang Tan
cs.AI
摘要
我们提出ReCamDriving——一种纯视觉驱动的、基于相机控制的新型轨迹视频生成框架。相较于修复式方法难以还原复杂伪影、激光雷达方案依赖稀疏不完整线索的局限,本框架利用稠密且场景完整的3D高斯溅射(3DGS)渲染结果提供显式几何指导,实现精确的相机可控生成。为缓解依赖3DGS渲染导致的修复行为过拟合问题,ReCamDriving采用两阶段训练范式:第一阶段通过相机位姿实现粗粒度控制,第二阶段引入3DGS渲染进行细粒度视角与几何引导。此外,我们提出基于3DGS的跨轨迹数据构建策略,消除相机变换模式在训练与测试阶段的差异,从而实现对单目视频的可扩展多轨迹监督。基于此策略,我们构建了包含超11万组平行轨迹视频对的ParaDrive数据集。大量实验表明,ReCamDriving在相机控制精度与结构一致性方面均达到最先进水平。
English
We propose ReCamDriving, a purely vision-based, camera-controlled novel-trajectory video generation framework. While repair-based methods fail to restore complex artifacts and LiDAR-based approaches rely on sparse and incomplete cues, ReCamDriving leverages dense and scene-complete 3DGS renderings for explicit geometric guidance, achieving precise camera-controllable generation. To mitigate overfitting to restoration behaviors when conditioned on 3DGS renderings, ReCamDriving adopts a two-stage training paradigm: the first stage uses camera poses for coarse control, while the second stage incorporates 3DGS renderings for fine-grained viewpoint and geometric guidance. Furthermore, we present a 3DGS-based cross-trajectory data curation strategy to eliminate the train-test gap in camera transformation patterns, enabling scalable multi-trajectory supervision from monocular videos. Based on this strategy, we construct the ParaDrive dataset, containing over 110K parallel-trajectory video pairs. Extensive experiments demonstrate that ReCamDriving achieves state-of-the-art camera controllability and structural consistency.