개방형 세계에서 교육적 학습을 위한 코드 꿈꾸기

초록

개방형 학습은 지능이 끊임없이 확장되는 환경 공간과의 지속적 상호작용에서 출현하는 것으로 정의합니다. 최근 발전된 방법들은 파운데이션 모델을 활용해 프로그램 방식으로 다양한 환경을 생성하지만, 이러한 접근법들은 지속적인 진화를 구성하기보다는 고립된 행동들을 발견하는 데 주로 초점을 맞춥니다. 복잡한 개방형 세계에서는 가능한 도전과제들의 조합 공간이 방대하여 에이전트가 지속적으로 학습 가능한 경험의 연속성을 발견하기 어렵습니다. 이를 해결하기 위해 우리는 파운데이션 모델이 실행 가능한 환경 코드를 합성하여 점진적 역량 강화를 위한 학습을 지원하는 프레임워크인 Dreaming in Code(DiCode)를 제안합니다. DiCode에서 "꿈꾸기"는 세계의 코드 수준 변이를 구체화하는 형태를 취합니다. 우리는 풍부한 메커니즘과 장기적 진전을 특징으로 하는 도전적인 개방형 벤치마크인 Craftax에 DiCode를 구현했습니다. 실험 결과 DiCode는 에이전트가 장기적 기술을 습득할 수 있도록 하여 가장 강력한 기준선 대비 평균 수익에서 16% 향상을 달성했으며, 기존 방법이 실패한 후반부 전투 과제에서도 비영점 성공률을 보였습니다. 우리의 결과는 코드 수준 환경 설계가 커리큘럼 제어를 위한 실용적 메커니즘을 제공함으로써 개방형 세계에서의 역량 격차를 연결하는 중간 환경 구축을 가능하게 함을 시사합니다. 프로젝트 페이지와 소스 코드는 https://konstantinosmitsides.github.io/dreaming-in-code 및 https://github.com/konstantinosmitsides/dreaming-in-code에서 확인할 수 있습니다.

English

Open-ended learning frames intelligence as emerging from continual interaction with an ever-expanding space of environments. While recent advances have utilized foundation models to programmatically generate diverse environments, these approaches often focus on discovering isolated behaviors rather than orchestrating sustained progression. In complex open-ended worlds, the large combinatorial space of possible challenges makes it difficult for agents to discover sequences of experiences that remain consistently learnable. To address this, we propose Dreaming in Code (DiCode), a framework in which foundation models synthesize executable environment code to scaffold learning toward increasing competence. In DiCode, "dreaming" takes the form of materializing code-level variations of the world. We instantiate DiCode in Craftax, a challenging open-ended benchmark characterized by rich mechanics and long-horizon progression. Empirically, DiCode enables agents to acquire long-horizon skills, achieving a 16% improvement in mean return over the strongest baseline and non-zero success on late-game combat tasks where prior methods fail. Our results suggest that code-level environment design provides a practical mechanism for curriculum control, enabling the construction of intermediate environments that bridge competence gaps in open-ended worlds. Project page and source code are available at https://konstantinosmitsides.github.io/dreaming-in-code and https://github.com/konstantinosmitsides/dreaming-in-code.

개방형 세계에서 교육적 학습을 위한 코드 꿈꾸기

Dreaming in Code for Curriculum Learning in Open-Ended Worlds

초록

Support