NU-MCC:具有鄰域解碼器和排斥UDF的多視角壓縮編碼。
NU-MCC: Multiview Compressive Coding with Neighborhood Decoder and Repulsive UDF
July 18, 2023
作者: Stefan Lionar, Xiangyu Xu, Min Lin, Gim Hee Lee
cs.AI
摘要
從單視圖 RGB-D 輸入進行的 3D 重建取得了顯著進展。MCC 是目前在這個領域的最先進方法,通過將視覺 Transformer 與大規模訓練結合,取得了前所未有的成功。然而,我們確定了 MCC 的兩個關鍵限制:1) Transformer 解碼器在處理大量查詢點時效率低下;2) 3D 表示在恢復高保真細節方面遇到困難。在本文中,我們提出了一種名為 NU-MCC 的新方法來解決這些限制。NU-MCC 包括兩個關鍵創新:一個鄰域解碼器和一個排斥無符號距離函數(Repulsive UDF)。首先,我們的鄰域解碼器引入中心點作為輸入視覺特徵的有效代理,使每個查詢點僅關注一個小鄰域。這種設計不僅能夠實現更快的推理速度,還能夠利用更精細的視覺特徵來改善 3D 紋理的恢復。其次,我們的排斥 UDF 是 MCC 中使用的佔用場的一種新穎替代方案,顯著提高了 3D 物體重建的質量。與標準 UDF 不完整的結果相比,我們提出的排斥 UDF 能夠實現更完整的表面重建。實驗結果表明,NU-MCC 能夠學習到強大的 3D 表示,顯著推動了單視圖 3D 重建的技術水平。特別是,在 CO3D-v2 數據集上,它在 F1 分數方面比 MCC 表現提高了 9.7%,運行速度更快了 5 倍以上。
English
Remarkable progress has been made in 3D reconstruction from single-view RGB-D
inputs. MCC is the current state-of-the-art method in this field, which
achieves unprecedented success by combining vision Transformers with
large-scale training. However, we identified two key limitations of MCC: 1) The
Transformer decoder is inefficient in handling large number of query points; 2)
The 3D representation struggles to recover high-fidelity details. In this
paper, we propose a new approach called NU-MCC that addresses these
limitations. NU-MCC includes two key innovations: a Neighborhood decoder and a
Repulsive Unsigned Distance Function (Repulsive UDF). First, our Neighborhood
decoder introduces center points as an efficient proxy of input visual
features, allowing each query point to only attend to a small neighborhood.
This design not only results in much faster inference speed but also enables
the exploitation of finer-scale visual features for improved recovery of 3D
textures. Second, our Repulsive UDF is a novel alternative to the occupancy
field used in MCC, significantly improving the quality of 3D object
reconstruction. Compared to standard UDFs that suffer from holes in results,
our proposed Repulsive UDF can achieve more complete surface reconstruction.
Experimental results demonstrate that NU-MCC is able to learn a strong 3D
representation, significantly advancing the state of the art in single-view 3D
reconstruction. Particularly, it outperforms MCC by 9.7% in terms of the
F1-score on the CO3D-v2 dataset with more than 5x faster running speed.