TTT3R: 테스트 타임 학습으로서의 3D 재구성

초록

현대 순환 신경망(RNN)은 선형 시간 복잡도 덕분에 3D 재구성을 위한 경쟁력 있는 아키텍처로 자리 잡았습니다. 그러나 이러한 모델은 훈련 컨텍스트 길이를 넘어서면 성능이 크게 저하되며, 길이 일반화 능력이 제한적임이 드러납니다. 본 연구에서는 테스트 타임 학습(Test-Time Training) 관점에서 3D 재구성 기반 모델을 재검토하며, 그 설계를 온라인 학습 문제로 재구성합니다. 이러한 관점을 바탕으로, 메모리 상태와 새로 들어오는 관측치 간의 정렬 신뢰도를 활용하여 메모리 업데이트를 위한 폐쇄형 학습률을 도출함으로써, 과거 정보를 유지하는 것과 새로운 관측치에 적응하는 것 사이의 균형을 맞춥니다. 이 훈련이 필요 없는 개입 방법, 즉 TTT3R은 길이 일반화를 크게 개선하여 전역 포즈 추정에서 기준선 대비 2배의 성능 향상을 달성하며, 수천 장의 이미지를 처리하는 데 단 6GB의 GPU 메모리로 20 FPS의 속도를 유지합니다. 코드는 https://rover-xingyu.github.io/TTT3R에서 확인할 수 있습니다.

English

Modern Recurrent Neural Networks have become a competitive architecture for 3D reconstruction due to their linear-time complexity. However, their performance degrades significantly when applied beyond the training context length, revealing limited length generalization. In this work, we revisit the 3D reconstruction foundation models from a Test-Time Training perspective, framing their designs as an online learning problem. Building on this perspective, we leverage the alignment confidence between the memory state and incoming observations to derive a closed-form learning rate for memory updates, to balance between retaining historical information and adapting to new observations. This training-free intervention, termed TTT3R, substantially improves length generalization, achieving a 2times improvement in global pose estimation over baselines, while operating at 20 FPS with just 6 GB of GPU memory to process thousands of images. Code available in https://rover-xingyu.github.io/TTT3R

TTT3R: 테스트 타임 학습으로서의 3D 재구성

TTT3R: 3D Reconstruction as Test-Time Training

초록

Support