TTT3R：將3D重建作為測試時訓練

摘要

現代循環神經網路因其線性時間複雜度，已成為三維重建領域中具有競爭力的架構。然而，當應用超出訓練上下文長度時，其性能顯著下降，顯示出有限的長度泛化能力。在本研究中，我們從測試時訓練的角度重新審視三維重建基礎模型，將其設計框架化為一個線上學習問題。基於這一視角，我們利用記憶狀態與新到觀測之間的對齊置信度，推導出記憶更新的閉式學習率，以在保留歷史資訊與適應新觀測之間取得平衡。這種無需訓練的干預方法，命名為TTT3R，大幅提升了長度泛化能力，在全局姿態估計上相較基線實現了兩倍的提升，同時僅需6GB的GPU記憶體即可以20FPS的速度處理數千張影像。程式碼可在https://rover-xingyu.github.io/TTT3R獲取。

English

Modern Recurrent Neural Networks have become a competitive architecture for 3D reconstruction due to their linear-time complexity. However, their performance degrades significantly when applied beyond the training context length, revealing limited length generalization. In this work, we revisit the 3D reconstruction foundation models from a Test-Time Training perspective, framing their designs as an online learning problem. Building on this perspective, we leverage the alignment confidence between the memory state and incoming observations to derive a closed-form learning rate for memory updates, to balance between retaining historical information and adapting to new observations. This training-free intervention, termed TTT3R, substantially improves length generalization, achieving a 2times improvement in global pose estimation over baselines, while operating at 20 FPS with just 6 GB of GPU memory to process thousands of images. Code available in https://rover-xingyu.github.io/TTT3R

TTT3R：將3D重建作為測試時訓練

TTT3R: 3D Reconstruction as Test-Time Training

摘要

Support