제로샷 월드 모델은 발달적으로 효율적인 학습자입니다

초록

유아는 깊이, 운동, 객체 응집성, 상호작용 및 물리적 장면 이해의 여러 측면을 추정하는 등 물리적 세계를 이해하는 초기 능력을 보여줍니다. 아이들은 데이터 효율적이면서도 유연한 인지 시스템으로, 극히 제한된 훈련 데이터에도 불구하고 능력을 형성하고 수많은 훈련되지 않은 과제로 일반화합니다. 이는 오늘날 최고의 AI 시스템에게도 주요 과제입니다. 본 연구에서는 이러한 능력에 대한 새로운 계산 가설인 제로샷 시각 세계 모델(ZWM)을 소개합니다. ZWM은 외양과 역학을 분리하는 희소 시간 인수 예측자, 근사적 인과 추론을 통한 제로샷 추정, 추론의 조합을 통한 복잡한 능력 구축이라는 세 가지 원리에 기반합니다. 우리는 ZWM이 단일 아동의 1인칭 경험으로부터 학습되어 여러 물리적 이해 벤치마크에서 빠르게 능력을 생성할 수 있음을 보여줍니다. 또한 이 모델은 아동 발달의 행동적 특성을 광범위하게 재현하고 뇌와 유사한 내부 표현을 구축합니다. 우리 연구는 인간 수준의 데이터로부터 효율적이고 유연하게 학습하는 청사진을 제시하며, 아동의 초기 물리적 이해에 대한 계산적 설명과 데이터 효율적인 AI 시스템을 위한 길을 함께 제시합니다.

English

Young children demonstrate early abilities to understand their physical world, estimating depth, motion, object coherence, interactions, and many other aspects of physical scene understanding. Children are both data-efficient and flexible cognitive systems, creating competence despite extremely limited training data, while generalizing to myriad untrained tasks -- a major challenge even for today's best AI systems. Here we introduce a novel computational hypothesis for these abilities, the Zero-shot Visual World Model (ZWM). ZWM is based on three principles: a sparse temporally-factored predictor that decouples appearance from dynamics; zero-shot estimation through approximate causal inference; and composition of inferences to build more complex abilities. We show that ZWM can be learned from the first-person experience of a single child, rapidly generating competence across multiple physical understanding benchmarks. It also broadly recapitulates behavioral signatures of child development and builds brain-like internal representations. Our work presents a blueprint for efficient and flexible learning from human-scale data, advancing both a computational account for children's early physical understanding and a path toward data-efficient AI systems.

제로샷 월드 모델은 발달적으로 효율적인 학습자입니다

Zero-shot World Models Are Developmentally Efficient Learners

초록

Support