ChatPaper.aiChatPaper

渐进式高斯变换器与各向异性感知采样在开放词汇占用预测中的应用

Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction

October 6, 2025
作者: Chi Yan, Dan Xu
cs.AI

摘要

近年来,3D占据预测任务取得了显著进展,在基于视觉的自动驾驶系统中扮演着关键角色。传统方法局限于固定的语义类别,而最新研究趋势转向预测与文本对齐的特征,以实现现实场景中的开放词汇文本查询。然而,在文本对齐的场景建模中存在一个权衡:稀疏的高斯表示难以捕捉场景中的小物体,而密集表示则带来显著的计算开销。针对这些局限,我们提出了PG-Occ,一种创新的渐进式高斯变换器框架,支持开放词汇的3D占据预测。该框架采用渐进式在线密集化策略,通过前馈方式逐步增强3D高斯表示,以捕捉细粒度的场景细节。通过迭代优化表示,框架实现了越来越精确和细致的场景理解。另一项关键贡献是引入了各向异性感知的采样策略,结合时空融合,自适应地为不同尺度和阶段的高斯分配感受野,从而实现更有效的特征聚合和更丰富的场景信息捕捉。通过大量评估,我们证明PG-Occ达到了最先进的性能,相较于之前的最佳方法,mIoU相对提升了14.3%。代码和预训练模型将在论文发表后发布于我们的项目页面:https://yanchi-3dv.github.io/PG-Occ。
English
The 3D occupancy prediction task has witnessed remarkable progress in recent years, playing a crucial role in vision-based autonomous driving systems. While traditional methods are limited to fixed semantic categories, recent approaches have moved towards predicting text-aligned features to enable open-vocabulary text queries in real-world scenes. However, there exists a trade-off in text-aligned scene modeling: sparse Gaussian representation struggles to capture small objects in the scene, while dense representation incurs significant computational overhead. To address these limitations, we present PG-Occ, an innovative Progressive Gaussian Transformer Framework that enables open-vocabulary 3D occupancy prediction. Our framework employs progressive online densification, a feed-forward strategy that gradually enhances the 3D Gaussian representation to capture fine-grained scene details. By iteratively enhancing the representation, the framework achieves increasingly precise and detailed scene understanding. Another key contribution is the introduction of an anisotropy-aware sampling strategy with spatio-temporal fusion, which adaptively assigns receptive fields to Gaussians at different scales and stages, enabling more effective feature aggregation and richer scene information capture. Through extensive evaluations, we demonstrate that PG-Occ achieves state-of-the-art performance with a relative 14.3% mIoU improvement over the previous best performing method. Code and pretrained models will be released upon publication on our project page: https://yanchi-3dv.github.io/PG-Occ
PDF92October 13, 2025