ChatPaper.aiChatPaper

EmbodiedSplat:面向开放词汇3D场景理解的在线前馈语义3D高斯溅射

EmbodiedSplat: Online Feed-Forward Semantic 3DGS for Open-Vocabulary 3D Scene Understanding

March 4, 2026
作者: Seungjun Lee, Zihan Wang, Yunsong Wang, Gim Hee Lee
cs.AI

摘要

在具身智能任务中,智能体需以在线近实时方式构建并理解三维场景,因此即时探索与理解三维场景至关重要。本研究提出EmbodiedSplat——一种面向开放词汇场景理解的在线前馈式3D高斯泼溅(3DGS)方法,能够从图像流中同步实现在线三维重建与三维语义理解。与现有通常局限于离线或逐场景优化的开放词汇3DGS方法不同,我们的目标具有双重性:1)以在线方式从超过300帧图像流中重建完整场景的语义嵌入3DGS;2)通过前馈式设计实现对新场景的高度泛化性,结合实时二维模型可支持近实时的三维语义重建。为实现这些目标,我们提出了带有CLIP全局码本的在线稀疏系数场,在将二维CLIP嵌入绑定至每个三维高斯的同时,最小化内存消耗并保持CLIP的完整语义泛化能力。此外,通过基于3D U-Net聚合3DGS的部分点云,我们生成具有三维几何感知的CLIP特征,以弥补面向二维的语言嵌入所缺乏的三维几何先验。在ScanNet、ScanNet++和Replica等多个室内数据集上的大量实验表明,我们的方法兼具高效性与有效性。项目页面详见:https://0nandon.github.io/EmbodiedSplat/。
English
Understanding a 3D scene immediately with its exploration is essential for embodied tasks, where an agent must construct and comprehend the 3D scene in an online and nearly real-time manner. In this study, we propose EmbodiedSplat, an online feed-forward 3DGS for open-vocabulary scene understanding that enables simultaneous online 3D reconstruction and 3D semantic understanding from the streaming images. Unlike existing open-vocabulary 3DGS methods which are typically restricted to either offline or per-scene optimization setting, our objectives are two-fold: 1) Reconstructs the semantic-embedded 3DGS of the entire scene from over 300 streaming images in an online manner. 2) Highly generalizable to novel scenes with feed-forward design and supports nearly real-time 3D semantic reconstruction when combined with real-time 2D models. To achieve these objectives, we propose an Online Sparse Coefficients Field with a CLIP Global Codebook where it binds the 2D CLIP embeddings to each 3D Gaussian while minimizing memory consumption and preserving the full semantic generalizability of CLIP. Furthermore, we generate 3D geometric-aware CLIP features by aggregating the partial point cloud of 3DGS through 3D U-Net to compensate the 3D geometric prior to 2D-oriented language embeddings. Extensive experiments on diverse indoor datasets, including ScanNet, ScanNet++, and Replica, demonstrate both the effectiveness and efficiency of our method. Check out our project page in https://0nandon.github.io/EmbodiedSplat/.
PDF12March 6, 2026