WORLDMEM: 메모리를 활용한 장기적 일관성 세계 시뮬레이션

초록

세계 시뮬레이션은 가상 환경을 모델링하고 행동의 결과를 예측할 수 있는 능력으로 인해 점점 더 많은 관심을 받고 있습니다. 그러나 제한된 시간적 컨텍스트 윈도우는 장기적인 일관성을 유지하는 데 실패를 초래하는 경우가 많으며, 특히 3D 공간 일관성을 유지하는 데 어려움을 겪습니다. 본 연구에서는 메모리 프레임과 상태(예: 포즈 및 타임스탬프)를 저장하는 메모리 유닛으로 구성된 메모리 뱅크를 통해 장면 생성을 향상시키는 WorldMem 프레임워크를 제안합니다. 메모리 프레임의 상태를 기반으로 관련 정보를 효과적으로 추출하는 메모리 주의 메커니즘을 사용함으로써, 우리의 방법은 상당한 시점 또는 시간적 차이가 있는 경우에도 이전에 관찰된 장면을 정확하게 재구성할 수 있습니다. 또한, 상태에 타임스탬프를 통합함으로써, 우리의 프레임워크는 정적인 세계를 모델링할 뿐만 아니라 시간에 따른 동적인 변화도 포착하여, 시뮬레이션된 세계 내에서의 인지와 상호작용을 가능하게 합니다. 가상 및 실제 시나리오에서의 광범위한 실험을 통해 우리의 접근 방식의 효과성을 검증하였습니다.

English

World simulation has gained increasing popularity due to its ability to model virtual environments and predict the consequences of actions. However, the limited temporal context window often leads to failures in maintaining long-term consistency, particularly in preserving 3D spatial consistency. In this work, we present WorldMem, a framework that enhances scene generation with a memory bank consisting of memory units that store memory frames and states (e.g., poses and timestamps). By employing a memory attention mechanism that effectively extracts relevant information from these memory frames based on their states, our method is capable of accurately reconstructing previously observed scenes, even under significant viewpoint or temporal gaps. Furthermore, by incorporating timestamps into the states, our framework not only models a static world but also captures its dynamic evolution over time, enabling both perception and interaction within the simulated world. Extensive experiments in both virtual and real scenarios validate the effectiveness of our approach.

WORLDMEM: 메모리를 활용한 장기적 일관성 세계 시뮬레이션

WORLDMEM: Long-term Consistent World Simulation with Memory

초록

Support