Point-MoE：エキスパートの混合による3Dセマンティックセグメンテーションのクロスドメイン汎化に向けて

要旨

スケーリング則が自然言語処理やコンピュータビジョンを変革してきた一方で、3Dポイントクラウド理解はまだその段階に達していない。これは、3Dデータセットの比較的小さな規模と、データ自体の多様なソースに起因している。ポイントクラウドは、深度カメラやLiDARなどの多様なセンサーによって、屋内や屋外などさまざまな領域で捕捉され、それぞれが独自のスキャンパターン、サンプリング密度、および意味的バイアスを導入する。このような領域の異質性は、特に推論時に領域ラベルが通常アクセスできないという現実的な制約の下で、大規模な統一モデルの訓練に対する主要な障壁となっている。本研究では、3D知覚における大規模なクロスドメイン汎化を可能にするために設計されたMixture-of-ExpertsアーキテクチャであるPoint-MoEを提案する。混合ドメインデータで訓練された標準的なポイントクラウドバックボーンが性能を大幅に低下させるのに対し、単純なtop-kルーティング戦略を用いたPoint-MoEは、領域ラベルにアクセスしなくても専門家を自動的に特化させることができることを示す。我々の実験では、Point-MoEが強力なマルチドメインベースラインを上回るだけでなく、未見の領域に対してもより良い汎化性能を示すことを実証している。この研究は、3D理解のためのスケーラブルな道筋を示している：手動のキュレーションやドメイン監視を通じて構造を課すのではなく、モデルに多様な3Dデータの構造を発見させることである。

English

While scaling laws have transformed natural language processing and computer vision, 3D point cloud understanding has yet to reach that stage. This can be attributed to both the comparatively smaller scale of 3D datasets, as well as the disparate sources of the data itself. Point clouds are captured by diverse sensors (e.g., depth cameras, LiDAR) across varied domains (e.g., indoor, outdoor), each introducing unique scanning patterns, sampling densities, and semantic biases. Such domain heterogeneity poses a major barrier towards training unified models at scale, especially under the realistic constraint that domain labels are typically inaccessible at inference time. In this work, we propose Point-MoE, a Mixture-of-Experts architecture designed to enable large-scale, cross-domain generalization in 3D perception. We show that standard point cloud backbones degrade significantly in performance when trained on mixed-domain data, whereas Point-MoE with a simple top-k routing strategy can automatically specialize experts, even without access to domain labels. Our experiments demonstrate that Point-MoE not only outperforms strong multi-domain baselines but also generalizes better to unseen domains. This work highlights a scalable path forward for 3D understanding: letting the model discover structure in diverse 3D data, rather than imposing it via manual curation or domain supervision.

Point-MoE：エキスパートの混合による3Dセマンティックセグメンテーションのクロスドメイン汎化に向けて

Point-MoE: Towards Cross-Domain Generalization in 3D Semantic Segmentation via Mixture-of-Experts

要旨

Support