ChatPaper.aiChatPaper

TTT3R:將3D重建作為測試時訓練

TTT3R: 3D Reconstruction as Test-Time Training

September 30, 2025
作者: Xingyu Chen, Yue Chen, Yuliang Xiu, Andreas Geiger, Anpei Chen
cs.AI

摘要

現代循環神經網路因其線性時間複雜度,已成為三維重建領域中具有競爭力的架構。然而,當應用超出訓練上下文長度時,其性能顯著下降,顯示出有限的長度泛化能力。在本研究中,我們從測試時訓練的角度重新審視三維重建基礎模型,將其設計框架化為一個線上學習問題。基於這一視角,我們利用記憶狀態與新到觀測之間的對齊置信度,推導出記憶更新的閉式學習率,以在保留歷史資訊與適應新觀測之間取得平衡。這種無需訓練的干預方法,命名為TTT3R,大幅提升了長度泛化能力,在全局姿態估計上相較基線實現了兩倍的提升,同時僅需6GB的GPU記憶體即可以20FPS的速度處理數千張影像。程式碼可在https://rover-xingyu.github.io/TTT3R獲取。
English
Modern Recurrent Neural Networks have become a competitive architecture for 3D reconstruction due to their linear-time complexity. However, their performance degrades significantly when applied beyond the training context length, revealing limited length generalization. In this work, we revisit the 3D reconstruction foundation models from a Test-Time Training perspective, framing their designs as an online learning problem. Building on this perspective, we leverage the alignment confidence between the memory state and incoming observations to derive a closed-form learning rate for memory updates, to balance between retaining historical information and adapting to new observations. This training-free intervention, termed TTT3R, substantially improves length generalization, achieving a 2times improvement in global pose estimation over baselines, while operating at 20 FPS with just 6 GB of GPU memory to process thousands of images. Code available in https://rover-xingyu.github.io/TTT3R
PDF41October 1, 2025