UniPre3D: 크로스 모달 가우시안 스플래팅을 통한 3D 포인트 클라우드 모델의 통합 사전 학습

초록

포인트 클라우드 데이터의 스케일 다양성은 3D 비전을 위한 통합 표현 학습 기술 개발에 상당한 도전 과제를 제시합니다. 현재, 통합된 3D 모델은 거의 없으며, 객체 수준과 장면 수준의 포인트 클라우드 모두에 동등하게 효과적인 사전 학습 방법은 존재하지 않습니다. 본 논문에서는 어떠한 스케일의 포인트 클라우드와 어떠한 아키텍처의 3D 모델에도 원활하게 적용할 수 있는 최초의 통합 사전 학습 방법인 UniPre3D를 소개합니다. 우리의 접근 방식은 사전 학습 작업으로 가우시안 프리미티브를 예측하고, 미분 가능한 가우시안 스플래팅을 사용하여 이미지를 렌더링함으로써 정밀한 픽셀 수준의 지도와 종단 간 최적화를 가능하게 합니다. 사전 학습 작업의 복잡성을 더욱 규제하고 모델의 초점을 기하학적 구조로 유도하기 위해, 사전 학습된 이미지 모델의 2D 특징을 통합하여 잘 확립된 텍스처 지식을 포함시킵니다. 다양한 객체 및 장면 수준의 작업에 대해 다양한 포인트 클라우드 모델을 백본으로 사용하여 제안된 방법의 보편적 효과를 광범위한 실험을 통해 검증합니다. 코드는 https://github.com/wangzy22/UniPre3D에서 확인할 수 있습니다.

English

The scale diversity of point cloud data presents significant challenges in developing unified representation learning techniques for 3D vision. Currently, there are few unified 3D models, and no existing pre-training method is equally effective for both object- and scene-level point clouds. In this paper, we introduce UniPre3D, the first unified pre-training method that can be seamlessly applied to point clouds of any scale and 3D models of any architecture. Our approach predicts Gaussian primitives as the pre-training task and employs differentiable Gaussian splatting to render images, enabling precise pixel-level supervision and end-to-end optimization. To further regulate the complexity of the pre-training task and direct the model's focus toward geometric structures, we integrate 2D features from pre-trained image models to incorporate well-established texture knowledge. We validate the universal effectiveness of our proposed method through extensive experiments across a variety of object- and scene-level tasks, using diverse point cloud models as backbones. Code is available at https://github.com/wangzy22/UniPre3D.

UniPre3D: 크로스 모달 가우시안 스플래팅을 통한 3D 포인트 클라우드 모델의 통합 사전 학습

UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting

초록

Support