tttLRM: Test-Time Training voor Lange Context en Autoregressieve 3D-reconstructie

Samenvatting

Wij presenteren tttLRM, een nieuw groot 3D-reconstructiemodel dat gebruikmaakt van een Test-Time Training (TTT)-laag om autoregressieve 3D-reconstructie met een lange context en lineaire computationele complexiteit mogelijk te maken, waardoor de capaciteit van het model verder wordt opgeschaald. Ons framework comprimeert efficiënt meerdere beeldobservaties in de snelle gewichten van de TTT-laag, waardoor een impliciete 3D-representatie in de latente ruimte ontstaat die kan worden gedecodeerd naar diverse expliciete formaten, zoals Gaussian Splats (GS), voor downstream-toepassingen. De online learning-variant van ons model ondersteunt progressieve 3D-reconstructie en verfijning op basis van streamende observaties. Wij tonen aan dat vooraf trainen op novel view synthesis-taken effectief transferleert naar expliciete 3D-modellering, wat resulteert in verbeterde reconstructiekwaliteit en snellere convergentie. Uitgebreide experimenten tonen aan dat onze methode superieure prestaties bereikt bij feedforward 3D Gaussian-reconstructie in vergelijking met state-of-the-art benaderingen, voor zowel objecten als scenes.

English

We propose tttLRM, a novel large 3D reconstruction model that leverages a Test-Time Training (TTT) layer to enable long-context, autoregressive 3D reconstruction with linear computational complexity, further scaling the model's capability. Our framework efficiently compresses multiple image observations into the fast weights of the TTT layer, forming an implicit 3D representation in the latent space that can be decoded into various explicit formats, such as Gaussian Splats (GS) for downstream applications. The online learning variant of our model supports progressive 3D reconstruction and refinement from streaming observations. We demonstrate that pretraining on novel view synthesis tasks effectively transfers to explicit 3D modeling, resulting in improved reconstruction quality and faster convergence. Extensive experiments show that our method achieves superior performance in feedforward 3D Gaussian reconstruction compared to state-of-the-art approaches on both objects and scenes.

tttLRM: Test-Time Training voor Lange Context en Autoregressieve 3D-reconstructie

tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction

Samenvatting

Support