邁向更多樣化且具挑戰性的點雲學習預訓練：基於解耦視圖的自監督交叉重建

摘要

點雲學習，特別是在無需人工標籤的自監督方式下，因其在廣泛應用中的潛在效用，已在視覺與學習社群中獲得越來越多的關注。現有的大多數點雲自監督學習生成方法，主要聚焦於從單一視角內的可見點恢復被遮擋的點。認識到雙視角預訓練範式本質上引入了更大的多樣性和變異性，這可能使得預訓練更具挑戰性和信息量。受此啟發，我們探索了雙視角學習在這一領域的潛力。本文中，我們提出了Point-PQAE，這是一種交叉重建的生成範式，首先生成兩個解耦的點雲/視角，然後從一個視角重建另一個視角。為實現這一目標，我們首次開發了一種用於點雲視角生成的裁剪機制，並進一步提出了一種新穎的位置編碼來表示兩個解耦視角之間的三維相對位置。與自我重建相比，交叉重建顯著增加了預訓練的難度，這使得我們的方法在三維自監督學習中超越了之前的單模態自我重建方法。具體而言，在ScanObjectNN的三個變體中，採用Mlp-Linear評估協議，我們的模型分別比自我重建基線（Point-MAE）提升了6.5%、7.0%和6.7%。代碼已公開於https://github.com/aHapBean/Point-PQAE。

English

Point cloud learning, especially in a self-supervised way without manual labels, has gained growing attention in both vision and learning communities due to its potential utility in a wide range of applications. Most existing generative approaches for point cloud self-supervised learning focus on recovering masked points from visible ones within a single view. Recognizing that a two-view pre-training paradigm inherently introduces greater diversity and variance, it may thus enable more challenging and informative pre-training. Inspired by this, we explore the potential of two-view learning in this domain. In this paper, we propose Point-PQAE, a cross-reconstruction generative paradigm that first generates two decoupled point clouds/views and then reconstructs one from the other. To achieve this goal, we develop a crop mechanism for point cloud view generation for the first time and further propose a novel positional encoding to represent the 3D relative position between the two decoupled views. The cross-reconstruction significantly increases the difficulty of pre-training compared to self-reconstruction, which enables our method to surpass previous single-modal self-reconstruction methods in 3D self-supervised learning. Specifically, it outperforms the self-reconstruction baseline (Point-MAE) by 6.5%, 7.0%, and 6.7% in three variants of ScanObjectNN with the Mlp-Linear evaluation protocol. The code is available at https://github.com/aHapBean/Point-PQAE.

邁向更多樣化且具挑戰性的點雲學習預訓練：基於解耦視圖的自監督交叉重建

Towards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views

摘要

Support