StereoPilot:通过生成先验学习统一高效的双目视觉转换
StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors
December 18, 2025
作者: Guibao Shen, Yihua Du, Wenhang Ge, Jing He, Chirui Chang, Donghao Zhou, Zhen Yang, Luozhou Wang, Xin Tao, Ying-Cong Chen
cs.AI
摘要
立体显示技术(包括VR头显与3D影院)的快速发展,对高质量立体视频内容的需求日益增长。然而,3D视频制作仍面临成本高昂、流程复杂的问题,而基于多阶段"深度-形变-修复"(DWI)流程的单目转立体方法受限于误差传播、深度歧义以及平行/汇聚式立体格式不兼容等瓶颈。为应对这些挑战,我们首次提出UniStereo——覆盖双立体格式的大规模统一数据集,以实现公平基准测试与鲁棒模型训练。基于该数据集,我们进一步提出StereoPilot模型:该高效前馈模型无需依赖显式深度图或迭代扩散采样,可直接合成目标视角画面。通过可学习的格式切换器与循环一致性损失,StereoPilot能自适应不同立体格式并提升一致性。大量实验表明,StereoPilot在视觉保真度与计算效率上均显著优于现有最优方法。项目页面:https://hit-perfect.github.io/StereoPilot/。
English
The rapid growth of stereoscopic displays, including VR headsets and 3D cinemas, has led to increasing demand for high-quality stereo video content. However, producing 3D videos remains costly and complex, while automatic Monocular-to-Stereo conversion is hindered by the limitations of the multi-stage ``Depth-Warp-Inpaint'' (DWI) pipeline. This paradigm suffers from error propagation, depth ambiguity, and format inconsistency between parallel and converged stereo configurations. To address these challenges, we introduce UniStereo, the first large-scale unified dataset for stereo video conversion, covering both stereo formats to enable fair benchmarking and robust model training. Building upon this dataset, we propose StereoPilot, an efficient feed-forward model that directly synthesizes the target view without relying on explicit depth maps or iterative diffusion sampling. Equipped with a learnable domain switcher and a cycle consistency loss, StereoPilot adapts seamlessly to different stereo formats and achieves improved consistency. Extensive experiments demonstrate that StereoPilot significantly outperforms state-of-the-art methods in both visual fidelity and computational efficiency. Project page: https://hit-perfect.github.io/StereoPilot/.