ChatPaper.aiChatPaper

零样本世界模型是发展高效的学习者

Zero-shot World Models Are Developmentally Efficient Learners

April 11, 2026
作者: Khai Loong Aw, Klemen Kotar, Wanhee Lee, Seungwoo Kim, Khaled Jedoui, Rahul Venkatesh, Lilian Naing Chen, Michael C. Frank, Daniel L. K. Yamins
cs.AI

摘要

幼儿在理解物理世界方面展现出早期能力,能够对深度、运动、物体连贯性、相互作用等物理场景理解的诸多方面进行预估。儿童是数据高效且灵活的认知系统,即便在训练数据极为有限的情况下仍能构建认知能力,并可泛化至无数未经训练的任务——这对当今最先进的人工智能系统仍是重大挑战。本文提出解释这些能力的新计算假说:零样本视觉世界模型(ZWM)。该模型基于三大原则:通过时间因子解耦外观与动态的稀疏预测器;基于近似因果推理的零样本估计;通过推理组合构建复杂能力。研究表明,ZWM仅需从单个儿童的第一视角经验中学习,即可快速在多项物理理解基准测试中生成认知能力。该模型还能复现儿童发展的行为特征,并构建类脑内部表征。本研究为从人类规模数据中实现高效灵活学习提供了蓝图,既推进了对儿童早期物理理解的计算理论阐释,也为构建数据高效的人工智能系统开辟了新路径。
English
Young children demonstrate early abilities to understand their physical world, estimating depth, motion, object coherence, interactions, and many other aspects of physical scene understanding. Children are both data-efficient and flexible cognitive systems, creating competence despite extremely limited training data, while generalizing to myriad untrained tasks -- a major challenge even for today's best AI systems. Here we introduce a novel computational hypothesis for these abilities, the Zero-shot Visual World Model (ZWM). ZWM is based on three principles: a sparse temporally-factored predictor that decouples appearance from dynamics; zero-shot estimation through approximate causal inference; and composition of inferences to build more complex abilities. We show that ZWM can be learned from the first-person experience of a single child, rapidly generating competence across multiple physical understanding benchmarks. It also broadly recapitulates behavioral signatures of child development and builds brain-like internal representations. Our work presents a blueprint for efficient and flexible learning from human-scale data, advancing both a computational account for children's early physical understanding and a path toward data-efficient AI systems.
PDF61April 15, 2026