SplAttN：通过高斯软栅格化与注意力机制连接二维与三维的点云补全方法

摘要

尽管多模态学习已推动点云补全技术发展，但其理论机制仍不明确。现有研究将成功归因于模态间的关联，然而我们发现标准硬投影会割裂这种联系：将稀疏点云投影至图像平面会产生极度稀疏的支撑集，阻碍视觉先验传播——这一失效模式被我们称为"跨模态熵坍缩"。为解决该实际局限，我们提出SplAttN模型，采用可微分高斯溅射替代硬投影以生成稠密的连续图像平面表征。通过将投影重构为连续密度估计问题，SplAttN避免了坍缩式稀疏支撑集，促进梯度流动，并提升跨模态关联的可学习性。大量实验表明，SplAttN在PCN与ShapeNet-55/34数据集上达到最先进性能。关键的是，我们利用真实场景KITTI基准测试作为多模态依赖性的压力测试。反事实评估表明：基线模型会退化为对视觉信息移除不敏感的单模态模板检索器，而SplAttN始终保持对视觉线索的强健依赖，验证了本方法能建立有效的跨模态关联。代码详见https://github.com/zay002/SplAttN。

English

Although multi-modal learning has advanced point cloud completion, the theoretical mechanisms remain unclear. Recent works attribute success to the connection between modalities, yet we identify that standard hard projection severs this connection: projecting a sparse point cloud onto the image plane yields an extremely sparse support, which hinders visual prior propagation, a failure mode we term Cross-Modal Entropy Collapse. To address this practical limitation, we propose SplAttN, which replaces hard projection with Differentiable Gaussian Splatting to produce a dense, continuous image-plane representation. By reformulating projection as continuous density estimation, SplAttN avoids collapsed sparse support, facilitates gradient flow, and improves cross-modal connection learnability. Extensive experiments show that SplAttN achieves state-of-the-art performance on PCN and ShapeNet-55/34. Crucially, we utilize the real-world KITTI benchmark as a stress test for multi-modal reliance. Counter-factual evaluation reveals that while baselines degenerate into unimodal template retrievers insensitive to visual removal, SplAttN maintains a robust dependency on visual cues, validating that our method establishes an effective cross-modal connection. Code is available at https://github.com/zay002/SplAttN.

SplAttN：通过高斯软栅格化与注意力机制连接二维与三维的点云补全方法

SplAttN: Bridging 2D and 3D with Gaussian Soft Splatting and Attention for Point Cloud Completion

摘要

Support