LitePT:更轻量、更强大的点云Transformer
LitePT: Lighter Yet Stronger Point Transformer
December 15, 2025
作者: Yuanwen Yue, Damien Robert, Jianyuan Wang, Sunghwan Hong, Jan Dirk Wegner, Christian Rupprecht, Konrad Schindler
cs.AI
摘要
现代三维点云处理神经网络架构同时包含卷积层与注意力模块,但如何最优组合仍不明确。我们通过分析不同计算模块在点云网络中的作用,发现一种直观规律:卷积适用于高分辨率浅层中的低阶几何特征提取,此时注意力机制代价高昂却无增益;而注意力能更高效地在低分辨率深层捕获高阶语义与上下文信息。基于此设计原则,我们提出新型改进版三维点云主干网络LitePT,在浅层采用卷积运算,深层切换至注意力机制。为规避丢弃冗余卷积层导致的空间布局信息损失,我们引入无需训练的新型三维位置编码PointROPE。最终模型参数量比顶尖技术Point Transformer V3减少3.6倍,运行速度提升2倍,内存占用降低2倍,但在多项任务与数据集上达到相当甚至更优性能。代码与模型已开源:https://github.com/prs-eth/LitePT。
English
Modern neural architectures for 3D point cloud processing contain both convolutional layers and attention blocks, but the best way to assemble them remains unclear. We analyse the role of different computational blocks in 3D point cloud networks and find an intuitive behaviour: convolution is adequate to extract low-level geometry at high-resolution in early layers, where attention is expensive without bringing any benefits; attention captures high-level semantics and context in low-resolution, deep layers more efficiently. Guided by this design principle, we propose a new, improved 3D point cloud backbone that employs convolutions in early stages and switches to attention for deeper layers. To avoid the loss of spatial layout information when discarding redundant convolution layers, we introduce a novel, training-free 3D positional encoding, PointROPE. The resulting LitePT model has 3.6times fewer parameters, runs 2times faster, and uses 2times less memory than the state-of-the-art Point Transformer V3, but nonetheless matches or even outperforms it on a range of tasks and datasets. Code and models are available at: https://github.com/prs-eth/LitePT.