弹性测试时训练下的快速空间记忆

摘要

大块测试时训练（LaCT）在长上下文三维重建中表现出色，但其完全可塑的推理时更新仍易受灾难性遗忘和过拟合影响。因此，LaCT通常采用覆盖完整输入序列的单一大数据块进行实例化，未能实现单次处理任意长序列的更高目标。受弹性权重巩固启发，我们提出弹性测试时训练，通过围绕锚定状态的费舍尔加权弹性先验来稳定LaCT的快速权重更新。该锚定点通过历史快速权重的指数移动平均实现演化，以平衡稳定性与可塑性。基于此改进架构，我们引入快速空间记忆（FSM）——一种高效可扩展的四维重建模型，能从长观测序列中学习时空表征并渲染新视角-时间组合。我们在大规模精选3D/4D数据上对FSM进行预训练，以捕捉复杂空间环境的动态特性与语义信息。大量实验表明，FSM支持长序列的快速自适应，通过更小的数据块实现高质量3D/4D重建，并有效缓解相机插值捷径问题。本研究旨在推动LaCT突破有限单数据块设定的限制，实现鲁棒的多数据块自适应——这是泛化至真正长序列的必要步骤，同时显著缓解激活值内存瓶颈。

English

Large Chunk Test-Time Training (LaCT) has shown strong performance on long-context 3D reconstruction, but its fully plastic inference-time updates remain vulnerable to catastrophic forgetting and overfitting. As a result, LaCT is typically instantiated with a single large chunk spanning the full input sequence, falling short of the broader goal of handling arbitrarily long sequences in a single pass. We propose Elastic Test-Time Training inspired by elastic weight consolidation, that stabilizes LaCT fast-weight updates with a Fisher-weighted elastic prior around a maintained anchor state. The anchor evolves as an exponential moving average of past fast weights to balance stability and plasticity. Based on this updated architecture, we introduce Fast Spatial Memory (FSM), an efficient and scalable model for 4D reconstruction that learns spatiotemporal representations from long observation sequences and renders novel view-time combinations. We pre-trained FSM on large-scale curated 3D/4D data to capture the dynamics and semantics of complex spatial environments. Extensive experiments show that FSM supports fast adaptation over long sequences and delivers high-quality 3D/4D reconstruction with smaller chunks and mitigating the camera-interpolation shortcut. Overall, we hope to advance LaCT beyond the bounded single-chunk setting toward robust multi-chunk adaptation, a necessary step for generalization to genuinely longer sequences, while substantially alleviating the activation-memory bottleneck.