ChatPaper.aiChatPaper

GASP:統一幾何與語義自監督預訓練於自動駕駛之應用

GASP: Unifying Geometric and Semantic Self-Supervised Pre-training for Autonomous Driving

March 19, 2025
作者: William Ljungbergh, Adam Lilja, Adam Tonderski. Arvid Laveno Ling, Carl Lindström, Willem Verbeke, Junsheng Fu, Christoffer Petersson, Lars Hammarstrand, Michael Felsberg
cs.AI

摘要

基於下一個詞預測的自監督預訓練方法,使大型語言模型能夠捕捉文本的底層結構,並在規模化應用時,在多種任務上取得了前所未有的性能表現。同樣地,自動駕駛產生了大量的時空數據,這暗示了利用規模來學習環境及其隨時間演變的幾何和語義結構的可能性。沿著這一方向,我們提出了一種幾何和語義自監督預訓練方法——GASP,該方法通過在時空中任意查詢的未來點預測以下內容來學習統一表示:(1) 一般佔用情況,捕捉三維場景的演變結構;(2) 自我佔用情況,模擬自車在環境中的行駛路徑;(3) 從視覺基礎模型中蒸餾出的高層次特徵。通過建模幾何和語義的四維佔用場而非原始傳感器測量數據,模型學習到了環境及其隨時間演變的結構化、可泛化的表示。我們在多個自動駕駛基準上驗證了GASP,展示了在語義佔用預測、在線地圖構建和自我軌跡預測方面的顯著改進。我們的結果表明,連續的四維幾何和語義佔用預測為自動駕駛提供了一種可擴展且有效的預訓練範式。有關代碼和更多可視化內容,請參見\href{https://research.zenseact.com/publications/gasp/}。
English
Self-supervised pre-training based on next-token prediction has enabled large language models to capture the underlying structure of text, and has led to unprecedented performance on a large array of tasks when applied at scale. Similarly, autonomous driving generates vast amounts of spatiotemporal data, alluding to the possibility of harnessing scale to learn the underlying geometric and semantic structure of the environment and its evolution over time. In this direction, we propose a geometric and semantic self-supervised pre-training method, GASP, that learns a unified representation by predicting, at any queried future point in spacetime, (1) general occupancy, capturing the evolving structure of the 3D scene; (2) ego occupancy, modeling the ego vehicle path through the environment; and (3) distilled high-level features from a vision foundation model. By modeling geometric and semantic 4D occupancy fields instead of raw sensor measurements, the model learns a structured, generalizable representation of the environment and its evolution through time. We validate GASP on multiple autonomous driving benchmarks, demonstrating significant improvements in semantic occupancy forecasting, online mapping, and ego trajectory prediction. Our results demonstrate that continuous 4D geometric and semantic occupancy prediction provides a scalable and effective pre-training paradigm for autonomous driving. For code and additional visualizations, see \href{https://research.zenseact.com/publications/gasp/.

Summary

AI-Generated Summary

PDF32March 21, 2025