tttLRM：長文脈および自己回帰的3D再構成のためのテストタイムトレーニング

要旨

本論文では、テストタイムトレーニング（TTT）層を活用した新規の大規模3D再構成モデルであるtttLRMを提案する。本モデルは線形計算量で長文脈・自己回帰的な3D再構成を実現し、モデルの能力をさらに拡張する。我々のフレームワークは、複数の画像観測をTTT層の高速重みに効率的に圧縮し、潜在空間に暗黙的な3D表現を形成する。この表現は、ガウススプラッティング（GS）など、下流アプリケーション向けの様々な明示的フォーマットにデコード可能である。オンライン学習版のモデルは、ストリーミング観測からの漸進的な3D再構成と精緻化をサポートする。新規視点合成タスクでの事前学習が明示的3Dモデリングに効果的に転移し、再構成品質の向上と収束の高速化をもたらすことを実証する。大規模な実験により、本手法が物体とシーンの両方において、最新の手法と比較してフィードフォワード型の3Dガウス再構成で優れた性能を達成することを示す。

English

We propose tttLRM, a novel large 3D reconstruction model that leverages a Test-Time Training (TTT) layer to enable long-context, autoregressive 3D reconstruction with linear computational complexity, further scaling the model's capability. Our framework efficiently compresses multiple image observations into the fast weights of the TTT layer, forming an implicit 3D representation in the latent space that can be decoded into various explicit formats, such as Gaussian Splats (GS) for downstream applications. The online learning variant of our model supports progressive 3D reconstruction and refinement from streaming observations. We demonstrate that pretraining on novel view synthesis tasks effectively transfers to explicit 3D modeling, resulting in improved reconstruction quality and faster convergence. Extensive experiments show that our method achieves superior performance in feedforward 3D Gaussian reconstruction compared to state-of-the-art approaches on both objects and scenes.

tttLRM：長文脈および自己回帰的3D再構成のためのテストタイムトレーニング

tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction

要旨

Support