탄력적인 테스트 타임 학습을 통한 고속 공간 메모리

초록

대용량 청크 테스트 타임 트레이닝(LaCT)은 장문맥 3D 재구성에서 강력한 성능을 보였지만, 완전 가소적인 추론 시점 업데이트는 여전히 파국적 망각과 과적합에 취약합니다. 그 결과 LaCT는 일반적으로 전체 입력 시퀀스를 포괄하는 단일 대용량 청크로 구현되어, 단일 패스로 임의적 길이의 시퀀스를 처리하려는 더 광범위한 목표를 달성하지 못하고 있습니다. 우리는 탄성 가중치 통합에서 영감을 받은 Elastic Test-Time Training을 제안하며, 이는 유지되는 앵커 상태 주변에 Fisher 가중 탄성 사전 분포를 사용하여 LaCT의 빠른 가중치 업데이트를 안정화합니다. 앵커는 과거 빠른 가중치의 지수 이동 평균으로 진화하여 안정성과 가소성 사이의 균형을 유지합니다. 이 개선된 아키텍처를 바탕으로 우리는 효율적이고 확장 가능한 4D 재구성 모델인 Fast Spatial Memory(FSM)를 소개합니다. FSM은 긴 관측 시퀀스로부터 시공간 표현을 학습하고 새로운 시점-시간 조합을 렌더링합니다. 우리는 FSM을 대규모로 정제된 3D/4D 데이터에 대해 사전 학습시켜 복잡한 공간 환경의 역학과 의미론을 포착하도록 했습니다. 폭넓은 실험 결과, FSM은 작은 청크를 사용하여 장문맥 시퀀스에 대한 빠른 적응을 지원하며 고품질 3D/4D 재구성을 제공하고 카메라 보간 단축 경로를 완화합니다. 종합적으로, 우리는 LaCT가 제한된 단일 청크 설정을 넘어 강건한 다중 청크 적응으로 발전시키고, 진정한 장문맥 일반화에 필요한 단계로 나아가며, 활성화 메모리 병목 현상을 상당히 완화하기를 기대합니다.

English

Large Chunk Test-Time Training (LaCT) has shown strong performance on long-context 3D reconstruction, but its fully plastic inference-time updates remain vulnerable to catastrophic forgetting and overfitting. As a result, LaCT is typically instantiated with a single large chunk spanning the full input sequence, falling short of the broader goal of handling arbitrarily long sequences in a single pass. We propose Elastic Test-Time Training inspired by elastic weight consolidation, that stabilizes LaCT fast-weight updates with a Fisher-weighted elastic prior around a maintained anchor state. The anchor evolves as an exponential moving average of past fast weights to balance stability and plasticity. Based on this updated architecture, we introduce Fast Spatial Memory (FSM), an efficient and scalable model for 4D reconstruction that learns spatiotemporal representations from long observation sequences and renders novel view-time combinations. We pre-trained FSM on large-scale curated 3D/4D data to capture the dynamics and semantics of complex spatial environments. Extensive experiments show that FSM supports fast adaptation over long sequences and delivers high-quality 3D/4D reconstruction with smaller chunks and mitigating the camera-interpolation shortcut. Overall, we hope to advance LaCT beyond the bounded single-chunk setting toward robust multi-chunk adaptation, a necessary step for generalization to genuinely longer sequences, while substantially alleviating the activation-memory bottleneck.

탄력적인 테스트 타임 학습을 통한 고속 공간 메모리

Fast Spatial Memory with Elastic Test-Time Training

초록

Support