邁向更多樣化且具挑戰性的點雲學習預訓練:基於解耦視圖的自監督交叉重建
Towards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views
September 1, 2025
作者: Xiangdong Zhang, Shaofeng Zhang, Junchi Yan
cs.AI
摘要
點雲學習,特別是在無需人工標籤的自監督方式下,因其在廣泛應用中的潛在效用,已在視覺與學習社群中獲得越來越多的關注。現有的大多數點雲自監督學習生成方法,主要聚焦於從單一視角內的可見點恢復被遮擋的點。認識到雙視角預訓練範式本質上引入了更大的多樣性和變異性,這可能使得預訓練更具挑戰性和信息量。受此啟發,我們探索了雙視角學習在這一領域的潛力。本文中,我們提出了Point-PQAE,這是一種交叉重建的生成範式,首先生成兩個解耦的點雲/視角,然後從一個視角重建另一個視角。為實現這一目標,我們首次開發了一種用於點雲視角生成的裁剪機制,並進一步提出了一種新穎的位置編碼來表示兩個解耦視角之間的三維相對位置。與自我重建相比,交叉重建顯著增加了預訓練的難度,這使得我們的方法在三維自監督學習中超越了之前的單模態自我重建方法。具體而言,在ScanObjectNN的三個變體中,採用Mlp-Linear評估協議,我們的模型分別比自我重建基線(Point-MAE)提升了6.5%、7.0%和6.7%。代碼已公開於https://github.com/aHapBean/Point-PQAE。
English
Point cloud learning, especially in a self-supervised way without manual
labels, has gained growing attention in both vision and learning communities
due to its potential utility in a wide range of applications. Most existing
generative approaches for point cloud self-supervised learning focus on
recovering masked points from visible ones within a single view. Recognizing
that a two-view pre-training paradigm inherently introduces greater diversity
and variance, it may thus enable more challenging and informative pre-training.
Inspired by this, we explore the potential of two-view learning in this domain.
In this paper, we propose Point-PQAE, a cross-reconstruction generative
paradigm that first generates two decoupled point clouds/views and then
reconstructs one from the other. To achieve this goal, we develop a crop
mechanism for point cloud view generation for the first time and further
propose a novel positional encoding to represent the 3D relative position
between the two decoupled views. The cross-reconstruction significantly
increases the difficulty of pre-training compared to self-reconstruction, which
enables our method to surpass previous single-modal self-reconstruction methods
in 3D self-supervised learning. Specifically, it outperforms the
self-reconstruction baseline (Point-MAE) by 6.5%, 7.0%, and 6.7% in three
variants of ScanObjectNN with the Mlp-Linear evaluation protocol. The code is
available at https://github.com/aHapBean/Point-PQAE.