tttLRM: 장문맥 및 자기회귀적 3D 재구성을 위한 테스트 타임 학습

초록

본 연구에서는 테스트 타임 학습(Test-Time Training, TTT) 레이어를 활용하여 선형 계산 복잡도로 장문 컨텍스트 자동회귀 3D 재구성을 가능하게 하는 새로운 대규모 3D 재구성 모델인 tttLRM을 제안합니다. 이는 모델의 성능을 더욱 확장합니다. 우리의 프레임워크는 여러 이미지 관측을 TTT 레이어의 빠른 가중치(fast weights)로 효율적으로 압축하여 잠재 공간에 암묵적 3D 표현을 형성하며, 이는 가우시안 스플랫(GS)과 같은 다양한 명시적 형식으로 디코딩되어 다운스트림 애플리케이션에 활용될 수 있습니다. 본 모델의 온라인 학습 변형은 스트리밍 관측으로부터 점진적인 3D 재구성 및 정제를 지원합니다. 새로운 시점 합성 작업에 대한 사전 학습이 명시적 3D 모델링으로 효과적으로 전이되어 재구성 품질이 향상되고 수렴 속도가 빨라짐을 입증합니다. 다양한 실험을 통해 우리의 방법이 객체 및 장면 모두에서 최첨단 접근법들 대비 피드포워드 3D 가우시안 재구성에서 우수한 성능을 달성함을 보여줍니다.

English

We propose tttLRM, a novel large 3D reconstruction model that leverages a Test-Time Training (TTT) layer to enable long-context, autoregressive 3D reconstruction with linear computational complexity, further scaling the model's capability. Our framework efficiently compresses multiple image observations into the fast weights of the TTT layer, forming an implicit 3D representation in the latent space that can be decoded into various explicit formats, such as Gaussian Splats (GS) for downstream applications. The online learning variant of our model supports progressive 3D reconstruction and refinement from streaming observations. We demonstrate that pretraining on novel view synthesis tasks effectively transfers to explicit 3D modeling, resulting in improved reconstruction quality and faster convergence. Extensive experiments show that our method achieves superior performance in feedforward 3D Gaussian reconstruction compared to state-of-the-art approaches on both objects and scenes.

tttLRM: 장문맥 및 자기회귀적 3D 재구성을 위한 테스트 타임 학습

tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction

초록

Support