ChatPaper.aiChatPaper

ShapeR:基于随意捕捉的鲁棒条件性三维形状生成

ShapeR: Robust Conditional 3D Shape Generation from Casual Captures

January 16, 2026
作者: Yawar Siddiqui, Duncan Frost, Samir Aroudj, Armen Avetisyan, Henry Howard-Jenkins, Daniel DeTone, Pierre Moulon, Qirui Wu, Zhengqin Li, Julian Straub, Richard Newcombe, Jakob Engel
cs.AI

摘要

三維形狀生成技術近期取得顯著進展,但現有方法大多依賴於乾淨、無遮擋且精確分割的輸入數據,這在現實場景中極難滿足。本文提出ShapeR——一種從隨意捕獲的序列中進行條件式三維物體形狀生成的新方法。針對輸入的圖像序列,我們整合現成的視覺-慣性SLAM技術、三維檢測算法及視覺-語言模型,為每個物體提取稀疏SLAM點雲、多視角位姿圖像及機器生成描述。通過訓練修正流轉換器有效融合這些模態信息,最終生成高保真度的度量三維形狀。為提升對隨意捕獲數據挑戰的魯棒性,我們採用動態組合增強、跨物體與場景級數據集的課程學習策略,以及背景雜訊處理技術。此外,我們構建了包含7個真實場景、178個帶幾何標註的野外物體的新評估基準。實驗表明,在此挑戰性設定下,ShapeR顯著優於現有方法,其倒角距離指標相較當前最優技術提升2.7倍。
English
Recent advances in 3D shape generation have achieved impressive results, but most existing methods rely on clean, unoccluded, and well-segmented inputs. Such conditions are rarely met in real-world scenarios. We present ShapeR, a novel approach for conditional 3D object shape generation from casually captured sequences. Given an image sequence, we leverage off-the-shelf visual-inertial SLAM, 3D detection algorithms, and vision-language models to extract, for each object, a set of sparse SLAM points, posed multi-view images, and machine-generated captions. A rectified flow transformer trained to effectively condition on these modalities then generates high-fidelity metric 3D shapes. To ensure robustness to the challenges of casually captured data, we employ a range of techniques including on-the-fly compositional augmentations, a curriculum training scheme spanning object- and scene-level datasets, and strategies to handle background clutter. Additionally, we introduce a new evaluation benchmark comprising 178 in-the-wild objects across 7 real-world scenes with geometry annotations. Experiments show that ShapeR significantly outperforms existing approaches in this challenging setting, achieving an improvement of 2.7x in Chamfer distance compared to state of the art.
PDF72January 20, 2026