Lite3R:一种模型无关的高效前馈三维重建框架
Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction
May 12, 2026
作者: Haoyu Zhang, Zeyu Zhang, Zedong Zhou, Yang Zhao, Hao Tang
cs.AI
摘要
基于Transformer的三维重建已成为一种强大的范式,能够从多视角观测中恢复几何与外观,在具有挑战性的视觉条件下展现出强劲性能。随着这些模型扩展至更大的骨干网络和更高分辨率的输入,提升其效率对于实际部署愈发重要。然而,现代三维Transformer流程面临两个耦合的挑战:密集的多视角注意力机制产生大量令牌混合开销,而低精度执行可能破坏对几何敏感的表示,导致深度、位姿和三维一致性降级。针对第一个挑战,我们提出Lite3R——一个模型无关的师生框架,用稀疏线性注意力替代密集注意力,以保留重要的几何交互同时降低注意力成本。针对第二个挑战,我们引入了一种参数高效的FP8感知量化感知训练策略(FP8-aware QAT),并结合部分注意力蒸馏。该策略冻结绝大多数预训练骨干参数,仅训练轻量化的线性分支投影层,在低精度部署中保持稳定的同时保留了预训练的几何先验。我们进一步在VGGT和DA3-Large两个代表性骨干网络上,基于BlendedMVS和DTU64数据集进行评估,结果表明Lite3R显著降低了延迟(1.7-2.0倍)和内存占用(1.9-2.4倍),同时整体上保持了具有竞争力的重建质量。这些结果证明,Lite3R为基于Transformer的实用三维重建提供了一种有效的算法-系统协同设计方法。代码:https://github.com/AIGeeksGroup/Lite3R。网站:https://aigeeksgroup.github.io/Lite3R。
English
Transformer-based 3D reconstruction has emerged as a powerful paradigm for recovering geometry and appearance from multi-view observations, offering strong performance across challenging visual conditions. As these models scale to larger backbones and higher-resolution inputs, improving their efficiency becomes increasingly important for practical deployment. However, modern 3D transformer pipelines face two coupled challenges: dense multi-view attention creates substantial token-mixing overhead, and low-precision execution can destabilize geometry-sensitive representations and degrade depth, pose, and 3D consistency. To address the first challenge, we propose Lite3R, a model-agnostic teacher-student framework that replaces dense attention with Sparse Linear Attention to preserve important geometric interactions while reducing attention cost. To address the second challenge, we introduce a parameter-efficient FP8-aware quantization-aware training (FP8-aware QAT) strategy with partial attention distillation, which freezes the vast majority of pretrained backbone parameters and trains only lightweight linear-branch projection layers, enabling stable low-precision deployment while retaining pretrained geometric priors. We further evaluate Lite3R on two representative backbones, VGGT and DA3-Large, over BlendedMVS and DTU64, showing that it substantially reduces latency (1.7-2.0x) and memory usage (1.9-2.4x) while preserving competitive reconstruction quality overall. These results demonstrate that Lite3R provides an effective algorithm-system co-design approach for practical transformer-based 3D reconstruction. Code: https://github.com/AIGeeksGroup/Lite3R. Website: https://aigeeksgroup.github.io/Lite3R.