ChatPaper.aiChatPaper

漸進式高斯變換器結合各向異性感知採樣於開放詞彙佔據預測

Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction

October 6, 2025
作者: Chi Yan, Dan Xu
cs.AI

摘要

近年來,三維佔據預測任務取得了顯著進展,在基於視覺的自動駕駛系統中扮演著至關重要的角色。傳統方法受限於固定的語義類別,而近期研究則轉向預測與文本對齊的特徵,以實現現實場景中的開放詞彙文本查詢。然而,在文本對齊的場景建模中存在一個權衡:稀疏的高斯表示難以捕捉場景中的小物體,而密集表示則帶來顯著的計算開銷。為解決這些限制,我們提出了PG-Occ,一種創新的漸進高斯變換器框架,能夠實現開放詞彙的三維佔據預測。該框架採用漸進式在線密集化策略,這是一種前饋策略,逐步增強三維高斯表示以捕捉細粒度的場景細節。通過迭代增強表示,框架實現了越來越精確和詳細的場景理解。另一項關鍵貢獻是引入了具有時空融合的各向異性感知採樣策略,該策略自適應地為不同尺度和階段的高斯分配感受野,從而實現更有效的特徵聚合和更豐富的場景信息捕捉。通過廣泛的評估,我們證明PG-Occ達到了最先進的性能,相較於之前表現最佳的方法,mIoU相對提升了14.3%。代碼和預訓練模型將在項目頁面發佈時公開:https://yanchi-3dv.github.io/PG-Occ
English
The 3D occupancy prediction task has witnessed remarkable progress in recent years, playing a crucial role in vision-based autonomous driving systems. While traditional methods are limited to fixed semantic categories, recent approaches have moved towards predicting text-aligned features to enable open-vocabulary text queries in real-world scenes. However, there exists a trade-off in text-aligned scene modeling: sparse Gaussian representation struggles to capture small objects in the scene, while dense representation incurs significant computational overhead. To address these limitations, we present PG-Occ, an innovative Progressive Gaussian Transformer Framework that enables open-vocabulary 3D occupancy prediction. Our framework employs progressive online densification, a feed-forward strategy that gradually enhances the 3D Gaussian representation to capture fine-grained scene details. By iteratively enhancing the representation, the framework achieves increasingly precise and detailed scene understanding. Another key contribution is the introduction of an anisotropy-aware sampling strategy with spatio-temporal fusion, which adaptively assigns receptive fields to Gaussians at different scales and stages, enabling more effective feature aggregation and richer scene information capture. Through extensive evaluations, we demonstrate that PG-Occ achieves state-of-the-art performance with a relative 14.3% mIoU improvement over the previous best performing method. Code and pretrained models will be released upon publication on our project page: https://yanchi-3dv.github.io/PG-Occ
PDF92October 13, 2025