EgoLifter：用於自我中心感知的開放世界3D分割

摘要

本文介紹了 EgoLifter，一個新穎的系統，可以自動將從自我中心感應器捕獲的場景分割為個別 3D 物體的完整分解。該系統專門設計用於自我中心數據，其中場景包含從自然（非掃描）運動中捕獲的數百個物體。EgoLifter 採用 3D 高斯模型作為 3D 場景和物體的基本表示，並使用來自“Segment Anything Model”（SAM）的分割遮罩作為弱監督，以學習對象實例的靈活和可提示的定義，不受任何特定對象分類的限制。為應對自我中心視頻中的動態物體挑戰，我們設計了一個瞬態預測模塊，該模塊學習如何過濾 3D 重建中的動態物體。結果是一個完全自動化的流程，能夠將 3D 物體實例重建為由 3D 高斯模型組成的集合，共同構成整個場景。我們在 Aria Digital Twin 數據集上創建了一個新的基準，定量展示了其在從自然自我中心輸入中進行開放世界 3D 分割的最新性能。我們在各種自我中心活動數據集上運行了 EgoLifter，展示了該方法在大規模 3D 自我中心感知方面的潛力。

English

In this paper we present EgoLifter, a novel system that can automatically segment scenes captured from egocentric sensors into a complete decomposition of individual 3D objects. The system is specifically designed for egocentric data where scenes contain hundreds of objects captured from natural (non-scanning) motion. EgoLifter adopts 3D Gaussians as the underlying representation of 3D scenes and objects and uses segmentation masks from the Segment Anything Model (SAM) as weak supervision to learn flexible and promptable definitions of object instances free of any specific object taxonomy. To handle the challenge of dynamic objects in ego-centric videos, we design a transient prediction module that learns to filter out dynamic objects in the 3D reconstruction. The result is a fully automatic pipeline that is able to reconstruct 3D object instances as collections of 3D Gaussians that collectively compose the entire scene. We created a new benchmark on the Aria Digital Twin dataset that quantitatively demonstrates its state-of-the-art performance in open-world 3D segmentation from natural egocentric input. We run EgoLifter on various egocentric activity datasets which shows the promise of the method for 3D egocentric perception at scale.

EgoLifter：用於自我中心感知的開放世界3D分割

EgoLifter: Open-world 3D Segmentation for Egocentric Perception

摘要

Support