ChatPaper.aiChatPaper

幾何塑造者:基於擴散先驗的開放世界視頻一致性幾何估計

GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors

April 1, 2025
作者: Tian-Xing Xu, Xiangjun Gao, Wenbo Hu, Xiaoyu Li, Song-Hai Zhang, Ying Shan
cs.AI

摘要

儘管視頻深度估計領域取得了顯著進展,現有方法在通過仿射不變預測實現幾何保真度方面仍存在固有侷限,這限制了它們在重建及其他基於度量的下游任務中的適用性。我們提出了GeometryCrafter,這是一個新穎的框架,能夠從開放世界視頻中恢復具有時間一致性的高保真點雲序列,從而實現精確的3D/4D重建、相機參數估計以及其他基於深度的應用。我們方法的核心在於一個點雲變分自編碼器(VAE),它學習了一個與視頻潛在分佈無關的潛在空間,以實現有效的點雲編碼與解碼。利用該VAE,我們訓練了一個視頻擴散模型來建模基於輸入視頻的點雲序列分佈。在多樣化數據集上的廣泛評估表明,GeometryCrafter在3D精度、時間一致性及泛化能力方面均達到了業界領先水平。
English
Despite remarkable advancements in video depth estimation, existing methods exhibit inherent limitations in achieving geometric fidelity through the affine-invariant predictions, limiting their applicability in reconstruction and other metrically grounded downstream tasks. We propose GeometryCrafter, a novel framework that recovers high-fidelity point map sequences with temporal coherence from open-world videos, enabling accurate 3D/4D reconstruction, camera parameter estimation, and other depth-based applications. At the core of our approach lies a point map Variational Autoencoder (VAE) that learns a latent space agnostic to video latent distributions for effective point map encoding and decoding. Leveraging the VAE, we train a video diffusion model to model the distribution of point map sequences conditioned on the input videos. Extensive evaluations on diverse datasets demonstrate that GeometryCrafter achieves state-of-the-art 3D accuracy, temporal consistency, and generalization capability.

Summary

AI-Generated Summary

PDF292April 2, 2025