ChatPaper.aiChatPaper

心智:世界模型中的记忆一致性与行为控制基准测试

MIND: Benchmarking Memory Consistency and Action Control in World Models

February 8, 2026
作者: Yixuan Ye, Xuanyu Lu, Yuxin Jiang, Yuchao Gu, Rui Zhao, Qiwei Liang, Jiachun Pan, Fengda Zhang, Weijia Wu, Alex Jinpeng Wang
cs.AI

摘要

世界模型旨在理解、记忆并预测动态视觉环境,然而评估其核心能力的统一基准仍属空白。为弥补这一缺陷,我们推出MIND——首个开放域闭环复访基准,专门用于评估世界模型的记忆一致性(Memory consIstency)与行动控制(action coNtrol)能力。MIND包含250段1080p分辨率、24帧率的高质量视频,其中100段(第一人称)+100段(第三人称)视频共享行动空间,另有25+25段视频涵盖八种多样场景下的差异化行动空间。我们设计了高效评估框架以衡量两大核心能力:记忆一致性与行动控制,通过多视角时序稳定性和上下文连贯性进行量化。此外,我们构建了包含不同角色移动速度和摄像机旋转角度的多样化行动空间,用以评估模型在共享场景下跨行动空间的泛化能力。为推进MIND基准的性能评测,我们提出MIND-World这一创新型交互式视频到世界(Video-to-World)基线方法。大量实验证明了MIND基准的完备性,同时揭示了当前世界模型面临的关键挑战,包括维持长期记忆一致性及跨行动空间泛化能力的不足。项目主页:https://csu-jpg.github.io/MIND.github.io/
English
World models aim to understand, remember, and predict dynamic visual environments, yet a unified benchmark for evaluating their fundamental abilities remains lacking. To address this gap, we introduce MIND, the first open-domain closed-loop revisited benchmark for evaluating Memory consIstency and action coNtrol in worlD models. MIND contains 250 high-quality videos at 1080p and 24 FPS, including 100 (first-person) + 100 (third-person) video clips under a shared action space and 25 + 25 clips across varied action spaces covering eight diverse scenes. We design an efficient evaluation framework to measure two core abilities: memory consistency and action control, capturing temporal stability and contextual coherence across viewpoints. Furthermore, we design various action spaces, including different character movement speeds and camera rotation angles, to evaluate the action generalization capability across different action spaces under shared scenes. To facilitate future performance benchmarking on MIND, we introduce MIND-World, a novel interactive Video-to-World baseline. Extensive experiments demonstrate the completeness of MIND and reveal key challenges in current world models, including the difficulty of maintaining long-term memory consistency and generalizing across action spaces. Project page: https://csu-jpg.github.io/MIND.github.io/
PDF81February 12, 2026