SplAttN: ガウシアンソフトスプラッティングとアテンションによる2Dと3Dの連携を実現する点群補完

要旨

マルチモーダル学習は点群補完の進展に寄与してきたが、その理論的メカニズムは未解明のままである。近年の研究は成功要因をモダリティ間の接続に帰属させるが、我々は標準的なハード投影がこの接続を断絶することを見出した：疎点群を画像平面に投影すると極度に疎なサポートが生成され、視覚的プリオアの伝播を阻害する。この現象を我々は「クロスモーダルエントロピー崩壊」と命名する。この実用的限界を解決するため、Differentiable Gaussian Splattingを用いて高密度かつ連続的な画像平面表現を生成するSplAttNを提案する。投影を連続的な密度推定問題として再定式化することで、SplAttNは崩壊的な疎サポートを回避し、勾配流を促進し、モダリティ間接続の学習可能性を向上させる。大規模実験により、SplAttNがPCNおよびShapeNet-55/34において最先端の性能を達成することを実証した。決定的に、実世界データセットKITTIをマルチモーダル依存性のストレステストとして活用した。反事実的評価により、ベースライン手法が視覚情報除去に鈍感な単モーダルテンプレート検索器へ退化する一方で、SplAttNは視覚手がかりに対する頑健な依存性を維持し、本手法が効果的モダリティ間接続を確立することを検証した。コードはhttps://github.com/zay002/SplAttNで公開されている。

English

Although multi-modal learning has advanced point cloud completion, the theoretical mechanisms remain unclear. Recent works attribute success to the connection between modalities, yet we identify that standard hard projection severs this connection: projecting a sparse point cloud onto the image plane yields an extremely sparse support, which hinders visual prior propagation, a failure mode we term Cross-Modal Entropy Collapse. To address this practical limitation, we propose SplAttN, which replaces hard projection with Differentiable Gaussian Splatting to produce a dense, continuous image-plane representation. By reformulating projection as continuous density estimation, SplAttN avoids collapsed sparse support, facilitates gradient flow, and improves cross-modal connection learnability. Extensive experiments show that SplAttN achieves state-of-the-art performance on PCN and ShapeNet-55/34. Crucially, we utilize the real-world KITTI benchmark as a stress test for multi-modal reliance. Counter-factual evaluation reveals that while baselines degenerate into unimodal template retrievers insensitive to visual removal, SplAttN maintains a robust dependency on visual cues, validating that our method establishes an effective cross-modal connection. Code is available at https://github.com/zay002/SplAttN.

SplAttN: ガウシアンソフトスプラッティングとアテンションによる2Dと3Dの連携を実現する点群補完

SplAttN: Bridging 2D and 3D with Gaussian Soft Splatting and Attention for Point Cloud Completion

要旨

Support