**SplAttN：基于高斯软栅格化与注意力机制连接二维与三维的点云补全方法**

摘要

尽管多模态学习推动了点云补全技术的进步，但其理论机制仍不明确。现有研究将成功归因于模态间的关联，然而我们发现标准硬投影会割裂这种联系：将稀疏点云投影至图像平面会产生极度稀疏的支撑集，阻碍视觉先验的传播——这一失效模式被我们称为"跨模态熵坍缩"。针对该实际局限，我们提出SplAttN方法，通过可微分高斯溅射替代硬投影以生成稠密的连续图像平面表示。通过将投影重构为连续密度估计问题，SplAttN避免了坍缩的稀疏支撑集，促进梯度流动，并提升跨模态关联的可学习性。大量实验表明，SplAttN在PCN和ShapeNet-55/34数据集上达到最先进性能。关键的是，我们采用真实场景的KITTI基准作为多模态依赖性的压力测试：反事实评估表明，基线方法在视觉信息移除时会退化为对视觉不敏感的单模态模板检索器，而SplAttN始终保持对视觉线索的鲁棒依赖性，验证了本方法能建立有效的跨模态关联。代码已开源于https://github.com/zay002/SplAttN。

English

Although multi-modal learning has advanced point cloud completion, the theoretical mechanisms remain unclear. Recent works attribute success to the connection between modalities, yet we identify that standard hard projection severs this connection: projecting a sparse point cloud onto the image plane yields an extremely sparse support, which hinders visual prior propagation, a failure mode we term Cross-Modal Entropy Collapse. To address this practical limitation, we propose SplAttN, which replaces hard projection with Differentiable Gaussian Splatting to produce a dense, continuous image-plane representation. By reformulating projection as continuous density estimation, SplAttN avoids collapsed sparse support, facilitates gradient flow, and improves cross-modal connection learnability. Extensive experiments show that SplAttN achieves state-of-the-art performance on PCN and ShapeNet-55/34. Crucially, we utilize the real-world KITTI benchmark as a stress test for multi-modal reliance. Counter-factual evaluation reveals that while baselines degenerate into unimodal template retrievers insensitive to visual removal, SplAttN maintains a robust dependency on visual cues, validating that our method establishes an effective cross-modal connection. Code is available at https://github.com/zay002/SplAttN.

SplAttN：基于高斯软栅格化与注意力机制连接二维与三维的点云补全方法

SplAttN: Bridging 2D and 3D with Gaussian Soft Splatting and Attention for Point Cloud Completion

摘要

Support