OmniNOCS:用于将二维物体三维提升的统一NOCS数据集和模型
OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects
July 11, 2024
作者: Akshay Krishnan, Abhijit Kundu, Kevis-Kokitsi Maninis, James Hays, Matthew Brown
cs.AI
摘要
我们提出了OmniNOCS,这是一个大规模的单目数据集,包含3D标准化对象坐标空间(NOCS)地图、对象掩模和室内外场景的3D边界框注释。OmniNOCS比现有的NOCS数据集(NOCS-Real275、Wild6D)拥有20倍的对象类别和200倍的实例数量。我们使用OmniNOCS来训练一种新颖的、基于Transformer的单目NOCS预测模型(NOCSformer),该模型能够从各种类别的2D对象检测中准确预测NOCS、实例掩模和姿势。这是第一个能够在提示2D框时泛化到广泛类别的NOCS模型。我们在3D定向边界框预测任务上评估了我们的模型,在这个任务中,它实现了与Cube R-CNN等最先进3D检测方法相媲美的结果。与其他3D检测方法不同,我们的模型还提供了详细和准确的3D对象形状和分割。我们基于OmniNOCS提出了NOCS预测任务的新型基准,希望这将成为未来在这一领域的有用基准。我们的数据集和代码将在项目网站上提供:https://omninocs.github.io。
English
We propose OmniNOCS, a large-scale monocular dataset with 3D Normalized
Object Coordinate Space (NOCS) maps, object masks, and 3D bounding box
annotations for indoor and outdoor scenes. OmniNOCS has 20 times more object
classes and 200 times more instances than existing NOCS datasets (NOCS-Real275,
Wild6D). We use OmniNOCS to train a novel, transformer-based monocular NOCS
prediction model (NOCSformer) that can predict accurate NOCS, instance masks
and poses from 2D object detections across diverse classes. It is the first
NOCS model that can generalize to a broad range of classes when prompted with
2D boxes. We evaluate our model on the task of 3D oriented bounding box
prediction, where it achieves comparable results to state-of-the-art 3D
detection methods such as Cube R-CNN. Unlike other 3D detection methods, our
model also provides detailed and accurate 3D object shape and segmentation. We
propose a novel benchmark for the task of NOCS prediction based on OmniNOCS,
which we hope will serve as a useful baseline for future work in this area. Our
dataset and code will be at the project website: https://omninocs.github.io.Summary
AI-Generated Summary