Snelle Ruimtelijk Geheugen met Elastische Test-Time Training

Samenvatting

Large Chunk Test-Time Training (LaCT) heeft sterke prestaties getoond bij 3D-reconstructie met lange context, maar de volledig plastische updates tijdens inferentie blijven kwetsbaar voor catastrofale vergetelheid en overfitting. Als gevolg daarvan wordt LaCT doorgaans geïnstantieerd met een enkel grote chunk die de volledige invoerreeks beslaat, wat tekortschiet voor het bredere doel om willekeurig lange sequenties in één keer te verwerken. Wij stellen Elastic Test-Time Training voor, geïnspireerd op elastische gewichtsconsolidatie, dat de LaCT-snelgewichtupdates stabiliseert met een Fisher-gewisse elastische prior rond een aangehouden ankerstatus. Het anker evolueert als een exponentieel voortschrijdend gemiddelde van vorige snelgewichten om stabiliteit en plasticiteit in evenwicht te brengen. Gebaseerd op deze geüpdatete architectuur introduceren we Fast Spatial Memory (FSM), een efficiënt en schaalbaar model voor 4D-reconstructie dat spatiotemporele representaties leert uit lange observatiereeksen en nieuwe view-tijdcombinaties rendert. We pre-trainden FSM op grootschalige gecureerde 3D/4D-data om de dynamiek en semantiek van complexe ruimtelijke omgevingen vast te leggen. Uitgebreide experimenten tonen aan dat FSM snelle aanpassing over lange sequenties ondersteunt en hoogwaardige 3D/4D-reconstructie levert met kleinere chunks, waarbij de camera-interpolatieshortcut wordt gemitigeerd. Al met al hopen we LaCT voorbij de begrensde single-chunk instelling te brengen naar robuuste multi-chunk aanpassing, een noodzakelijke stap voor generalisatie naar werkelijk langere sequenties, terwijl de activeringsgeheugenflessenhals aanzienlijk wordt verlicht.

English

Large Chunk Test-Time Training (LaCT) has shown strong performance on long-context 3D reconstruction, but its fully plastic inference-time updates remain vulnerable to catastrophic forgetting and overfitting. As a result, LaCT is typically instantiated with a single large chunk spanning the full input sequence, falling short of the broader goal of handling arbitrarily long sequences in a single pass. We propose Elastic Test-Time Training inspired by elastic weight consolidation, that stabilizes LaCT fast-weight updates with a Fisher-weighted elastic prior around a maintained anchor state. The anchor evolves as an exponential moving average of past fast weights to balance stability and plasticity. Based on this updated architecture, we introduce Fast Spatial Memory (FSM), an efficient and scalable model for 4D reconstruction that learns spatiotemporal representations from long observation sequences and renders novel view-time combinations. We pre-trained FSM on large-scale curated 3D/4D data to capture the dynamics and semantics of complex spatial environments. Extensive experiments show that FSM supports fast adaptation over long sequences and delivers high-quality 3D/4D reconstruction with smaller chunks and mitigating the camera-interpolation shortcut. Overall, we hope to advance LaCT beyond the bounded single-chunk setting toward robust multi-chunk adaptation, a necessary step for generalization to genuinely longer sequences, while substantially alleviating the activation-memory bottleneck.

Snelle Ruimtelijk Geheugen met Elastische Test-Time Training

Fast Spatial Memory with Elastic Test-Time Training

Samenvatting

Support