Lite3R:一個模型無關的高效前饋式三維重建框架
Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction
May 12, 2026
作者: Haoyu Zhang, Zeyu Zhang, Zedong Zhou, Yang Zhao, Hao Tang
cs.AI
摘要
基於Transformer的3D重建已成為一種強大的範例,能夠從多視角觀測中恢復幾何形狀與外觀,並在具挑戰性的視覺條件下展現優異效能。隨著這類模型擴展至更大的骨幹網路與更高解析度的輸入,提升其效率對實際部署日益重要。然而,現代3D Transformer流程面臨兩個耦合的挑戰:密集的多視角注意力造成大量令牌混合開銷,且低精度執行可能破壞對幾何敏感的表示,導致深度、姿態及3D一致性退化。為解決第一個挑戰,我們提出Lite3R——一個模型無關的師生框架,以稀疏線性注意力取代密集注意力,在保持重要幾何交互作用的同時降低注意力成本。為解決第二個挑戰,我們引入一種參數高效的FP8感知量化感知訓練(FP8-aware QAT)策略,搭配部分注意力蒸餾,凍結絕大多數預訓練骨幹參數,僅訓練輕量線性分支投影層,在保留預訓練幾何先驗的同時實現穩定的低精度部署。我們進一步在兩個代表性骨幹網路VGGT與DA3-Large上,於BlendedMVS和DTU64數據集評估Lite3R,結果顯示其在保持整體競爭性重建品質的同時,大幅降低延遲(1.7-2.0倍)與記憶體使用量(1.9-2.4倍)。這些結果證明Lite3R為實用化的基於Transformer的3D重建提供了有效的演算法-系統協同設計方法。程式碼:https://github.com/AIGeeksGroup/Lite3R。網站:https://aigeeksgroup.github.io/Lite3R。
English
Transformer-based 3D reconstruction has emerged as a powerful paradigm for recovering geometry and appearance from multi-view observations, offering strong performance across challenging visual conditions. As these models scale to larger backbones and higher-resolution inputs, improving their efficiency becomes increasingly important for practical deployment. However, modern 3D transformer pipelines face two coupled challenges: dense multi-view attention creates substantial token-mixing overhead, and low-precision execution can destabilize geometry-sensitive representations and degrade depth, pose, and 3D consistency. To address the first challenge, we propose Lite3R, a model-agnostic teacher-student framework that replaces dense attention with Sparse Linear Attention to preserve important geometric interactions while reducing attention cost. To address the second challenge, we introduce a parameter-efficient FP8-aware quantization-aware training (FP8-aware QAT) strategy with partial attention distillation, which freezes the vast majority of pretrained backbone parameters and trains only lightweight linear-branch projection layers, enabling stable low-precision deployment while retaining pretrained geometric priors. We further evaluate Lite3R on two representative backbones, VGGT and DA3-Large, over BlendedMVS and DTU64, showing that it substantially reduces latency (1.7-2.0x) and memory usage (1.9-2.4x) while preserving competitive reconstruction quality overall. These results demonstrate that Lite3R provides an effective algorithm-system co-design approach for practical transformer-based 3D reconstruction. Code: https://github.com/AIGeeksGroup/Lite3R. Website: https://aigeeksgroup.github.io/Lite3R.