ChatPaper.aiChatPaper

NU-MCC:具有邻域解码器和斥力UDF的多视角压缩编码

NU-MCC: Multiview Compressive Coding with Neighborhood Decoder and Repulsive UDF

July 18, 2023
作者: Stefan Lionar, Xiangyu Xu, Min Lin, Gim Hee Lee
cs.AI

摘要

从单视图RGB-D输入进行3D重建取得了显著进展。MCC是当前该领域的最先进方法,通过将视觉Transformer与大规模训练相结合,取得了前所未有的成功。然而,我们确定了MCC的两个关键局限:1)Transformer解码器在处理大量查询点时效率低下;2)3D表示难以恢复高保真细节。在本文中,我们提出了一种名为NU-MCC的新方法来解决这些局限。NU-MCC包括两个关键创新:邻域解码器和斥力无符号距离函数(Repulsive UDF)。首先,我们的邻域解码器引入中心点作为输入视觉特征的高效代理,使每个查询点仅关注一个小邻域。这种设计不仅导致更快的推理速度,还能利用更精细的视觉特征来改善3D纹理的恢复。其次,我们的斥力UDF是MCC中占用字段的一种新颖替代方案,显著提高了3D对象重建的质量。与标准UDF存在结果中的空洞问题相比,我们提出的斥力UDF可以实现更完整的表面重建。实验结果表明,NU-MCC能够学习到强大的3D表示,显著推动了单视图3D重建的最新技术。特别是,在CO3D-v2数据集上,它在F1分数上比MCC高出9.7%,且运行速度快5倍以上。
English
Remarkable progress has been made in 3D reconstruction from single-view RGB-D inputs. MCC is the current state-of-the-art method in this field, which achieves unprecedented success by combining vision Transformers with large-scale training. However, we identified two key limitations of MCC: 1) The Transformer decoder is inefficient in handling large number of query points; 2) The 3D representation struggles to recover high-fidelity details. In this paper, we propose a new approach called NU-MCC that addresses these limitations. NU-MCC includes two key innovations: a Neighborhood decoder and a Repulsive Unsigned Distance Function (Repulsive UDF). First, our Neighborhood decoder introduces center points as an efficient proxy of input visual features, allowing each query point to only attend to a small neighborhood. This design not only results in much faster inference speed but also enables the exploitation of finer-scale visual features for improved recovery of 3D textures. Second, our Repulsive UDF is a novel alternative to the occupancy field used in MCC, significantly improving the quality of 3D object reconstruction. Compared to standard UDFs that suffer from holes in results, our proposed Repulsive UDF can achieve more complete surface reconstruction. Experimental results demonstrate that NU-MCC is able to learn a strong 3D representation, significantly advancing the state of the art in single-view 3D reconstruction. Particularly, it outperforms MCC by 9.7% in terms of the F1-score on the CO3D-v2 dataset with more than 5x faster running speed.
PDF90December 15, 2024