ChatPaper.aiChatPaper

LangSplatV2:高维三维语言高斯溅射技术,帧率突破450+

LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS

July 9, 2025
作者: Wanhua Li, Yujie Zhao, Minghan Qin, Yang Liu, Yuanhao Cai, Chuang Gan, Hanspeter Pfister
cs.AI

摘要

本文介绍了LangSplatV2,该系统在高分辨率图像上实现了476.2 FPS的高维特征溅射和384.6 FPS的3D开放词汇文本查询,分别比LangSplat提升了42倍的速度和47倍的性能,同时提高了查询精度。LangSplat采用高斯溅射技术将2D CLIP语言特征嵌入3D空间,显著提升了速度,并学习了一个结合SAM语义的精确3D语言场。这些3D语言场的进步对于需要在复杂场景中进行语言交互的应用至关重要。然而,即使使用先进的A100 GPU,LangSplat仍未实现实时推理性能(8.2 FPS),这严重限制了其广泛应用。本文首先对LangSplat进行了详细的时间分析,发现重量级解码器是主要的速度瓶颈。我们的解决方案LangSplatV2假设每个高斯在全局字典中充当稀疏编码,从而学习了一个完全消除重量级解码器需求的3D稀疏系数场。通过利用这种稀疏性,我们进一步提出了一种高效的稀疏系数溅射方法,并进行了CUDA优化,在仅需超低维特征溅射时间成本的情况下,渲染出高质量的高维特征图。实验结果表明,LangSplatV2不仅实现了更好或相当的查询精度,而且速度显著提升。代码和演示可在我们的项目页面获取:https://langsplat-v2.github.io。
English
In this paper, we introduce LangSplatV2, which achieves high-dimensional feature splatting at 476.2 FPS and 3D open-vocabulary text querying at 384.6 FPS for high-resolution images, providing a 42 times speedup and a 47 times boost over LangSplat respectively, along with improved query accuracy. LangSplat employs Gaussian Splatting to embed 2D CLIP language features into 3D, significantly enhancing speed and learning a precise 3D language field with SAM semantics. Such advancements in 3D language fields are crucial for applications that require language interaction within complex scenes. However, LangSplat does not yet achieve real-time inference performance (8.2 FPS), even with advanced A100 GPUs, severely limiting its broader application. In this paper, we first conduct a detailed time analysis of LangSplat, identifying the heavyweight decoder as the primary speed bottleneck. Our solution, LangSplatV2 assumes that each Gaussian acts as a sparse code within a global dictionary, leading to the learning of a 3D sparse coefficient field that entirely eliminates the need for a heavyweight decoder. By leveraging this sparsity, we further propose an efficient sparse coefficient splatting method with CUDA optimization, rendering high-dimensional feature maps at high quality while incurring only the time cost of splatting an ultra-low-dimensional feature. Our experimental results demonstrate that LangSplatV2 not only achieves better or competitive query accuracy but is also significantly faster. Codes and demos are available at our project page: https://langsplat-v2.github.io.
PDF191July 11, 2025