EgoLifter:针对以自我为中心的感知的开放世界3D分割
EgoLifter: Open-world 3D Segmentation for Egocentric Perception
March 26, 2024
作者: Qiao Gu, Zhaoyang Lv, Duncan Frost, Simon Green, Julian Straub, Chris Sweeney
cs.AI
摘要
本文介绍了EgoLifter,这是一个新颖的系统,可以自动将从主体传感器捕获的场景分割成单个3D对象的完整分解。该系统专为包含数百个从自然(非扫描)运动捕获的对象的主体数据而设计。EgoLifter采用3D高斯作为3D场景和对象的基本表示,并利用来自“Segment Anything Model”(SAM)的分割掩模作为弱监督,以学习灵活且可提示的对象实例定义,不受任何特定对象分类的限制。为了处理主体视频中的动态对象挑战,我们设计了一个瞬态预测模块,学习如何过滤出3D重建中的动态对象。结果是一个完全自动化的流水线,能够将3D对象实例重建为由3D高斯组成的集合,共同构成整个场景。我们在Aria Digital Twin数据集上创建了一个新的基准,定量展示了其在从自然主体输入进行开放世界3D分割方面的最新性能。我们在各种主体活动数据集上运行了EgoLifter,显示了该方法在规模上用于3D主体感知的潜力。
English
In this paper we present EgoLifter, a novel system that can automatically
segment scenes captured from egocentric sensors into a complete decomposition
of individual 3D objects. The system is specifically designed for egocentric
data where scenes contain hundreds of objects captured from natural
(non-scanning) motion. EgoLifter adopts 3D Gaussians as the underlying
representation of 3D scenes and objects and uses segmentation masks from the
Segment Anything Model (SAM) as weak supervision to learn flexible and
promptable definitions of object instances free of any specific object
taxonomy. To handle the challenge of dynamic objects in ego-centric videos, we
design a transient prediction module that learns to filter out dynamic objects
in the 3D reconstruction. The result is a fully automatic pipeline that is able
to reconstruct 3D object instances as collections of 3D Gaussians that
collectively compose the entire scene. We created a new benchmark on the Aria
Digital Twin dataset that quantitatively demonstrates its state-of-the-art
performance in open-world 3D segmentation from natural egocentric input. We run
EgoLifter on various egocentric activity datasets which shows the promise of
the method for 3D egocentric perception at scale.Summary
AI-Generated Summary