オープンエンドな世界におけるカリキュラム学習のためのコード夢見

要旨

オープンエンド学習は、知能を絶えず拡大する環境空間との継続的な相互作用から創発するものとして捉える。近年の研究ではファウンデーションモデルを用いてプログラム的に多様な環境を生成する手法が進展しているが、これらのアプローチは持続的な進歩を体系化するよりも、孤立した行動の発見に焦点を当てることが多い。複雑なオープンエンド世界では、可能な課題の組み合わせ空間が膨大であるため、エージェントが一貫して学習可能な経験の連鎖を発見することが困難である。この問題に対処するため、我々はDreaming in Code（DiCode）を提案する。これはファウンデーションモデルが実行可能な環境コードを合成し、能力向上に向けた学習の足場を構築するフレームワークである。DiCodeにおいて「夢見る」ことは、世界のコードレベルの変異を具体化する形で行われる。我々はDiCodeを、豊富なメカニクスと長期的な進展を特徴とする挑戦的なオープンエンドベンチマークであるCraftax上で実装した。実験では、DiCodeによりエージェントが長期的スキルを獲得し、最強のベースラインに対して平均リターンで16%の改善を達成し、従来手法が失敗した終盤の戦闘タスクにおいて非ゼロの成功率を示した。我々の結果は、コードレベルの環境設計がカリキュラム制御の実用的なメカニズムを提供し、オープンエンド世界における能力ギャップを埋める中間環境の構築を可能にすることを示唆する。プロジェクトページとソースコードはhttps://konstantinosmitsides.github.io/dreaming-in-code および https://github.com/konstantinosmitsides/dreaming-in-code で公開されている。

English

Open-ended learning frames intelligence as emerging from continual interaction with an ever-expanding space of environments. While recent advances have utilized foundation models to programmatically generate diverse environments, these approaches often focus on discovering isolated behaviors rather than orchestrating sustained progression. In complex open-ended worlds, the large combinatorial space of possible challenges makes it difficult for agents to discover sequences of experiences that remain consistently learnable. To address this, we propose Dreaming in Code (DiCode), a framework in which foundation models synthesize executable environment code to scaffold learning toward increasing competence. In DiCode, "dreaming" takes the form of materializing code-level variations of the world. We instantiate DiCode in Craftax, a challenging open-ended benchmark characterized by rich mechanics and long-horizon progression. Empirically, DiCode enables agents to acquire long-horizon skills, achieving a 16% improvement in mean return over the strongest baseline and non-zero success on late-game combat tasks where prior methods fail. Our results suggest that code-level environment design provides a practical mechanism for curriculum control, enabling the construction of intermediate environments that bridge competence gaps in open-ended worlds. Project page and source code are available at https://konstantinosmitsides.github.io/dreaming-in-code and https://github.com/konstantinosmitsides/dreaming-in-code.

オープンエンドな世界におけるカリキュラム学習のためのコード夢見

Dreaming in Code for Curriculum Learning in Open-Ended Worlds

要旨

Support