TTT3R:将三维重建作为测试时训练
TTT3R: 3D Reconstruction as Test-Time Training
September 30, 2025
作者: Xingyu Chen, Yue Chen, Yuliang Xiu, Andreas Geiger, Anpei Chen
cs.AI
摘要
现代循环神经网络因其线性时间复杂度,已成为三维重建领域的有力竞争者。然而,当应用超出训练上下文长度时,其性能显著下降,显示出长度泛化能力的局限。在本研究中,我们从测试时训练的角度重新审视三维重建基础模型,将其设计框架化为一个在线学习问题。基于这一视角,我们利用记忆状态与输入观测之间的对齐置信度,推导出记忆更新的闭式学习率,以在保留历史信息与适应新观测之间取得平衡。这一无需额外训练的策略,命名为TTT3R,大幅提升了长度泛化能力,在全局姿态估计上较基线实现了两倍的提升,同时以20帧每秒的速度运行,仅需6GB GPU内存即可处理数千张图像。代码可在https://rover-xingyu.github.io/TTT3R获取。
English
Modern Recurrent Neural Networks have become a competitive architecture for
3D reconstruction due to their linear-time complexity. However, their
performance degrades significantly when applied beyond the training context
length, revealing limited length generalization. In this work, we revisit the
3D reconstruction foundation models from a Test-Time Training perspective,
framing their designs as an online learning problem. Building on this
perspective, we leverage the alignment confidence between the memory state and
incoming observations to derive a closed-form learning rate for memory updates,
to balance between retaining historical information and adapting to new
observations. This training-free intervention, termed TTT3R, substantially
improves length generalization, achieving a 2times improvement in global
pose estimation over baselines, while operating at 20 FPS with just 6 GB of GPU
memory to process thousands of images. Code available in
https://rover-xingyu.github.io/TTT3R