弹性测试时训练下的快速空间记忆
Fast Spatial Memory with Elastic Test-Time Training
April 8, 2026
作者: Ziqiao Ma, Xueyang Yu, Haoyu Zhen, Yuncong Yang, Joyce Chai, Chuang Gan
cs.AI
摘要
大块测试时训练(LaCT)在长上下文三维重建任务中展现出强劲性能,但其完全可塑的推理时更新机制仍易受灾难性遗忘和过拟合问题影响。这导致LaCT通常只能实例化为覆盖完整输入序列的单一大型数据块,难以实现单次处理任意长序列的更高目标。受弹性权重巩固机制启发,我们提出弹性测试时训练方法,通过围绕锚定状态建立费舍尔加权的弹性先验来稳定LaCT的快速权重更新。该锚定点通过历史快速权重的指数移动平均实现动态演化,以平衡稳定性和可塑性。基于此升级架构,我们进一步提出快速空间记忆模型(FSM)——一种高效可扩展的四维重建方案,能够从长观测序列中学习时空表征并渲染新颖的视角-时间组合。我们在大规模精选3D/4D数据上对FSM进行预训练,使其能捕捉复杂空间环境的动态特性与语义信息。大量实验表明,FSM支持长序列的快速自适应,通过更小的数据块实现高质量3D/4D重建,并有效缓解相机插值捷径问题。本研究旨在推动LaCT突破有限单块处理的限制,迈向稳健的多块自适应阶段——这是实现真正长序列泛化的关键步骤,同时显著缓解激活值内存瓶颈问题。
English
Large Chunk Test-Time Training (LaCT) has shown strong performance on long-context 3D reconstruction, but its fully plastic inference-time updates remain vulnerable to catastrophic forgetting and overfitting. As a result, LaCT is typically instantiated with a single large chunk spanning the full input sequence, falling short of the broader goal of handling arbitrarily long sequences in a single pass. We propose Elastic Test-Time Training inspired by elastic weight consolidation, that stabilizes LaCT fast-weight updates with a Fisher-weighted elastic prior around a maintained anchor state. The anchor evolves as an exponential moving average of past fast weights to balance stability and plasticity. Based on this updated architecture, we introduce Fast Spatial Memory (FSM), an efficient and scalable model for 4D reconstruction that learns spatiotemporal representations from long observation sequences and renders novel view-time combinations. We pre-trained FSM on large-scale curated 3D/4D data to capture the dynamics and semantics of complex spatial environments. Extensive experiments show that FSM supports fast adaptation over long sequences and delivers high-quality 3D/4D reconstruction with smaller chunks and mitigating the camera-interpolation shortcut. Overall, we hope to advance LaCT beyond the bounded single-chunk setting toward robust multi-chunk adaptation, a necessary step for generalization to genuinely longer sequences, while substantially alleviating the activation-memory bottleneck.