ChatPaper.aiChatPaper

立體導航:透過生成先驗學習統一且高效的立體影像轉換技術

StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors

December 18, 2025
作者: Guibao Shen, Yihua Du, Wenhang Ge, Jing He, Chirui Chang, Donghao Zhou, Zhen Yang, Luozhou Wang, Xin Tao, Ying-Cong Chen
cs.AI

摘要

立體顯示技術(包括VR頭戴裝置與3D影院)的快速發展,對高品質立體影片內容的需求日益增長。然而,3D影片製作仍面臨成本高昂與流程複雜的挑戰,而基於多階段「深度-扭曲-修補」流程的自動單目轉立體方法,則受困於誤差傳遞、深度歧義性,以及平行與交會式立體配置間的格式不一致等問題。為解決這些難題,我們提出首個大規模統一立體影片轉換數據集UniStereo,涵蓋兩種立體格式以實現公平基準測試與魯棒模型訓練。基於此數據集,我們進一步設計StereoPilot模型——一種無需依賴顯式深度圖或迭代擴散採樣的高效前饋模型,能直接合成目標視角。該模型配備可學習的域切換器與循環一致性損失,可無縫適應不同立體格式並提升一致性。大量實驗表明,StereoPilot在視覺保真度與計算效率上均顯著優於現有頂尖方法。項目頁面:https://hit-perfect.github.io/StereoPilot/。
English
The rapid growth of stereoscopic displays, including VR headsets and 3D cinemas, has led to increasing demand for high-quality stereo video content. However, producing 3D videos remains costly and complex, while automatic Monocular-to-Stereo conversion is hindered by the limitations of the multi-stage ``Depth-Warp-Inpaint'' (DWI) pipeline. This paradigm suffers from error propagation, depth ambiguity, and format inconsistency between parallel and converged stereo configurations. To address these challenges, we introduce UniStereo, the first large-scale unified dataset for stereo video conversion, covering both stereo formats to enable fair benchmarking and robust model training. Building upon this dataset, we propose StereoPilot, an efficient feed-forward model that directly synthesizes the target view without relying on explicit depth maps or iterative diffusion sampling. Equipped with a learnable domain switcher and a cycle consistency loss, StereoPilot adapts seamlessly to different stereo formats and achieves improved consistency. Extensive experiments demonstrate that StereoPilot significantly outperforms state-of-the-art methods in both visual fidelity and computational efficiency. Project page: https://hit-perfect.github.io/StereoPilot/.
PDF342December 20, 2025