TTTLRM：面向长上下文与自回归三维重建的测试时训练

摘要

我们提出tttLRM，一种新型大规模三维重建模型，该模型通过引入测试时训练层实现具有线性计算复杂度的长上下文自回归三维重建，从而进一步提升模型的扩展能力。我们的框架能够将多幅图像观测高效压缩至TTT层的快速权重中，在隐空间形成可解码为多种显式格式的隐式三维表征（如适用于下游应用的高斯泼溅表示）。模型的在线学习变体支持基于流式观测的渐进式三维重建与优化。实验表明，通过在新视角生成任务上的预训练可有效迁移至显式三维建模，从而提升重建质量并加速收敛。大量实验证明，在物体和场景的三维高斯重建任务中，我们的方法相比现有最优技术实现了更卓越的前馈重建性能。

English

We propose tttLRM, a novel large 3D reconstruction model that leverages a Test-Time Training (TTT) layer to enable long-context, autoregressive 3D reconstruction with linear computational complexity, further scaling the model's capability. Our framework efficiently compresses multiple image observations into the fast weights of the TTT layer, forming an implicit 3D representation in the latent space that can be decoded into various explicit formats, such as Gaussian Splats (GS) for downstream applications. The online learning variant of our model supports progressive 3D reconstruction and refinement from streaming observations. We demonstrate that pretraining on novel view synthesis tasks effectively transfers to explicit 3D modeling, resulting in improved reconstruction quality and faster convergence. Extensive experiments show that our method achieves superior performance in feedforward 3D Gaussian reconstruction compared to state-of-the-art approaches on both objects and scenes.

TTTLRM：面向长上下文与自回归三维重建的测试时训练

tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction

摘要

Support