ChatPaper.aiChatPaper

从面具到世界:世界模型的漫游指南

From Masks to Worlds: A Hitchhiker's Guide to World Models

October 23, 2025
作者: Jinbin Bai, Yu Lei, Hecong Wu, Yuchen Zhu, Shufan Li, Yi Xin, Xiangtai Li, Molei Tao, Aditya Grover, Ming-Hsuan Yang
cs.AI

摘要

本文并非传统意义上的世界模型综述,而是面向世界构建者的实践指南。我们无意罗列所有提及"世界模型"的文献,而是沿着清晰的技术脉络展开:从早期实现多模态表征学习统一的掩码模型,到采用单一范式的统一架构,再到实现感知-行动闭环的交互式生成模型,最终演进至能够维持世界持续性的记忆增强系统。我们摒弃松散关联的技术分支,聚焦三大核心要素:生成引擎、交互闭环与记忆系统,论证这正是通往真正世界模型的最具前景之路。
English
This is not a typical survey of world models; it is a guide for those who want to build worlds. We do not aim to catalog every paper that has ever mentioned a ``world model". Instead, we follow one clear road: from early masked models that unified representation learning across modalities, to unified architectures that share a single paradigm, then to interactive generative models that close the action-perception loop, and finally to memory-augmented systems that sustain consistent worlds over time. We bypass loosely related branches to focus on the core: the generative heart, the interactive loop, and the memory system. We show that this is the most promising path towards true world models.
PDF62December 2, 2025