漸進式高斯變換器結合各向異性感知採樣於開放詞彙佔據預測
Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction
October 6, 2025
作者: Chi Yan, Dan Xu
cs.AI
摘要
近年來,三維佔據預測任務取得了顯著進展,在基於視覺的自動駕駛系統中扮演著至關重要的角色。傳統方法受限於固定的語義類別,而近期研究則轉向預測與文本對齊的特徵,以實現現實場景中的開放詞彙文本查詢。然而,在文本對齊的場景建模中存在一個權衡:稀疏的高斯表示難以捕捉場景中的小物體,而密集表示則帶來顯著的計算開銷。為解決這些限制,我們提出了PG-Occ,一種創新的漸進高斯變換器框架,能夠實現開放詞彙的三維佔據預測。該框架採用漸進式在線密集化策略,這是一種前饋策略,逐步增強三維高斯表示以捕捉細粒度的場景細節。通過迭代增強表示,框架實現了越來越精確和詳細的場景理解。另一項關鍵貢獻是引入了具有時空融合的各向異性感知採樣策略,該策略自適應地為不同尺度和階段的高斯分配感受野,從而實現更有效的特徵聚合和更豐富的場景信息捕捉。通過廣泛的評估,我們證明PG-Occ達到了最先進的性能,相較於之前表現最佳的方法,mIoU相對提升了14.3%。代碼和預訓練模型將在項目頁面發佈時公開:https://yanchi-3dv.github.io/PG-Occ
English
The 3D occupancy prediction task has witnessed remarkable progress in recent
years, playing a crucial role in vision-based autonomous driving systems. While
traditional methods are limited to fixed semantic categories, recent approaches
have moved towards predicting text-aligned features to enable open-vocabulary
text queries in real-world scenes. However, there exists a trade-off in
text-aligned scene modeling: sparse Gaussian representation struggles to
capture small objects in the scene, while dense representation incurs
significant computational overhead. To address these limitations, we present
PG-Occ, an innovative Progressive Gaussian Transformer Framework that enables
open-vocabulary 3D occupancy prediction. Our framework employs progressive
online densification, a feed-forward strategy that gradually enhances the 3D
Gaussian representation to capture fine-grained scene details. By iteratively
enhancing the representation, the framework achieves increasingly precise and
detailed scene understanding. Another key contribution is the introduction of
an anisotropy-aware sampling strategy with spatio-temporal fusion, which
adaptively assigns receptive fields to Gaussians at different scales and
stages, enabling more effective feature aggregation and richer scene
information capture. Through extensive evaluations, we demonstrate that PG-Occ
achieves state-of-the-art performance with a relative 14.3% mIoU improvement
over the previous best performing method. Code and pretrained models will be
released upon publication on our project page:
https://yanchi-3dv.github.io/PG-Occ