ChatPaper.aiChatPaper

立體世界:具備幾何感知的單目到立體影片生成技術

StereoWorld: Geometry-Aware Monocular-to-Stereo Video Generation

December 10, 2025
作者: Ke Xing, Longfei Li, Yuyang Yin, Hanwen Liang, Guixun Luo, Chen Fang, Jue Wang, Konstantinos N. Plataniotis, Xiaojie Jin, Yao Zhao, Yunchao Wei
cs.AI

摘要

隨著XR設備的日益普及,市場對高品質立體影片的需求激增,但其製作仍面臨成本高昂且易產生視覺瑕疵的挑戰。為解決此問題,我們提出StereoWorld——一個端到端的生成框架,通過重新調用預訓練的影片生成器實現高保真度的單目到立體影片轉換。該框架在將單目影片輸入作為聯合條件約束的同時,引入幾何感知正則化方法對生成過程進行顯式監督,以確保三維結構的準確性。此外,我們整合了時空分塊策略,實現高效的高解析度合成。為支持大規模訓練與評估,我們構建了高清立體影片數據集,包含超過1100萬幀符合人眼自然瞳距的對齊畫面。大量實驗表明,StereoWorld在視覺保真度與幾何一致性方面顯著優於現有方法,能生成更具優越性的立體影片。項目頁面請訪問:https://ke-xing.github.io/StereoWorld/。
English
The growing adoption of XR devices has fueled strong demand for high-quality stereo video, yet its production remains costly and artifact-prone. To address this challenge, we present StereoWorld, an end-to-end framework that repurposes a pretrained video generator for high-fidelity monocular-to-stereo video generation. Our framework jointly conditions the model on the monocular video input while explicitly supervising the generation with a geometry-aware regularization to ensure 3D structural fidelity. A spatio-temporal tiling scheme is further integrated to enable efficient, high-resolution synthesis. To enable large-scale training and evaluation, we curate a high-definition stereo video dataset containing over 11M frames aligned to natural human interpupillary distance (IPD). Extensive experiments demonstrate that StereoWorld substantially outperforms prior methods, generating stereo videos with superior visual fidelity and geometric consistency. The project webpage is available at https://ke-xing.github.io/StereoWorld/.
PDF602December 13, 2025