渐进式高斯变换器与各向异性感知采样在开放词汇占用预测中的应用
Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction
October 6, 2025
作者: Chi Yan, Dan Xu
cs.AI
摘要
近年来,3D占据预测任务取得了显著进展,在基于视觉的自动驾驶系统中扮演着关键角色。传统方法局限于固定的语义类别,而最新研究趋势转向预测与文本对齐的特征,以实现现实场景中的开放词汇文本查询。然而,在文本对齐的场景建模中存在一个权衡:稀疏的高斯表示难以捕捉场景中的小物体,而密集表示则带来显著的计算开销。针对这些局限,我们提出了PG-Occ,一种创新的渐进式高斯变换器框架,支持开放词汇的3D占据预测。该框架采用渐进式在线密集化策略,通过前馈方式逐步增强3D高斯表示,以捕捉细粒度的场景细节。通过迭代优化表示,框架实现了越来越精确和细致的场景理解。另一项关键贡献是引入了各向异性感知的采样策略,结合时空融合,自适应地为不同尺度和阶段的高斯分配感受野,从而实现更有效的特征聚合和更丰富的场景信息捕捉。通过大量评估,我们证明PG-Occ达到了最先进的性能,相较于之前的最佳方法,mIoU相对提升了14.3%。代码和预训练模型将在论文发表后发布于我们的项目页面:https://yanchi-3dv.github.io/PG-Occ。
English
The 3D occupancy prediction task has witnessed remarkable progress in recent
years, playing a crucial role in vision-based autonomous driving systems. While
traditional methods are limited to fixed semantic categories, recent approaches
have moved towards predicting text-aligned features to enable open-vocabulary
text queries in real-world scenes. However, there exists a trade-off in
text-aligned scene modeling: sparse Gaussian representation struggles to
capture small objects in the scene, while dense representation incurs
significant computational overhead. To address these limitations, we present
PG-Occ, an innovative Progressive Gaussian Transformer Framework that enables
open-vocabulary 3D occupancy prediction. Our framework employs progressive
online densification, a feed-forward strategy that gradually enhances the 3D
Gaussian representation to capture fine-grained scene details. By iteratively
enhancing the representation, the framework achieves increasingly precise and
detailed scene understanding. Another key contribution is the introduction of
an anisotropy-aware sampling strategy with spatio-temporal fusion, which
adaptively assigns receptive fields to Gaussians at different scales and
stages, enabling more effective feature aggregation and richer scene
information capture. Through extensive evaluations, we demonstrate that PG-Occ
achieves state-of-the-art performance with a relative 14.3% mIoU improvement
over the previous best performing method. Code and pretrained models will be
released upon publication on our project page:
https://yanchi-3dv.github.io/PG-Occ