세계 모델 연구는 단순히 특정 작업에 세계 지식을 주입하는 것이 아니다

초록

세계 모델은 물리적 역학과 세계 지식을 대규모 모델에 주입하여 향상시키려는 AI 연구의 핵심 분야로 부상했습니다. 핵심 목표는 에이전트가 복잡한 환경을 이해, 예측, 상호작용할 수 있도록 하는 것입니다. 그러나 현재 연구 동향은 통합된 정의나 프레임워크 구축보다 시각 예측, 3D 추정, 기호 접지 등 개별 작업에 세계 지식을 주입하는 데 집중된 파편적 접근이 주를 이룹니다. 이러한 작업 특화적 통합은 성능 향상을 가져오지만, 종종 전체론적 세계 이해에 필요한 체계적 일관성이 부족합니다. 본 논문에서는 이러한 파편적 접근의 한계를 분석하고 세계 모델을 위한 통합 설계 명세를 제안합니다. 강력한 세계 모델은 개별 능력의 단순 집합이 아닌 상호작용, 인지, 기호 추론, 공간 표현을 통합적으로 포함하는 규범적 프레임워크여야 함을 주장합니다. 본 연구는 보다 일반적이고 강건하며 원칙적인 세계 모델 개발을 위한 구조화된 관점을 제시하는 것을 목표로 합니다.

English

World models have emerged as a critical frontier in AI research, aiming to enhance large models by infusing them with physical dynamics and world knowledge. The core objective is to enable agents to understand, predict, and interact with complex environments. However, current research landscape remains fragmented, with approaches predominantly focused on injecting world knowledge into isolated tasks, such as visual prediction, 3D estimation, or symbol grounding, rather than establishing a unified definition or framework. While these task-specific integrations yield performance gains, they often lack the systematic coherence required for holistic world understanding. In this paper, we analyze the limitations of such fragmented approaches and propose a unified design specification for world models. We suggest that a robust world model should not be a loose collection of capabilities but a normative framework that integrally incorporates interaction, perception, symbolic reasoning, and spatial representation. This work aims to provide a structured perspective to guide future research toward more general, robust, and principled models of the world.

세계 모델 연구는 단순히 특정 작업에 세계 지식을 주입하는 것이 아니다

Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks

초록

Support