UniPre3D: クロスモーダルガウススプラッティングによる3D点群モデルの統合事前学習

要旨

点群データのスケール多様性は、3D視覚のための統一的な表現学習技術の開発において重要な課題を提示しています。現在、統一的な3Dモデルはほとんど存在せず、オブジェクトレベルとシーンレベルの点群の両方に等しく有効な事前学習手法は存在しません。本論文では、あらゆるスケールの点群とあらゆるアーキテクチャの3Dモデルにシームレスに適用可能な、初めての統一的な事前学習手法であるUniPre3Dを紹介します。私たちのアプローチでは、事前学習タスクとしてガウシアンプリミティブを予測し、微分可能なガウシアンスプラッティングを用いて画像をレンダリングすることで、ピクセルレベルの正確な監視とエンドツーエンドの最適化を実現しています。さらに、事前学習タスクの複雑さを調整し、モデルの焦点を幾何学的構造に向けるために、事前学習済みの画像モデルから2D特徴を統合し、確立されたテクスチャ知識を取り入れています。私たちは、様々なオブジェクトレベルおよびシーンレベルのタスクにおいて、多様な点群モデルをバックボーンとして使用し、提案手法の普遍的な有効性を広範な実験を通じて検証しています。コードはhttps://github.com/wangzy22/UniPre3Dで公開されています。

English

The scale diversity of point cloud data presents significant challenges in developing unified representation learning techniques for 3D vision. Currently, there are few unified 3D models, and no existing pre-training method is equally effective for both object- and scene-level point clouds. In this paper, we introduce UniPre3D, the first unified pre-training method that can be seamlessly applied to point clouds of any scale and 3D models of any architecture. Our approach predicts Gaussian primitives as the pre-training task and employs differentiable Gaussian splatting to render images, enabling precise pixel-level supervision and end-to-end optimization. To further regulate the complexity of the pre-training task and direct the model's focus toward geometric structures, we integrate 2D features from pre-trained image models to incorporate well-established texture knowledge. We validate the universal effectiveness of our proposed method through extensive experiments across a variety of object- and scene-level tasks, using diverse point cloud models as backbones. Code is available at https://github.com/wangzy22/UniPre3D.

UniPre3D: クロスモーダルガウススプラッティングによる3D点群モデルの統合事前学習

UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting

要旨

Support