ChatPaper.aiChatPaper

一生一學:從無指導探索中推斷隨機環境的符號世界模型

One Life to Learn: Inferring Symbolic World Models for Stochastic Environments from Unguided Exploration

October 14, 2025
作者: Zaid Khan, Archiki Prasad, Elias Stengel-Eskin, Jaemin Cho, Mohit Bansal
cs.AI

摘要

符號世界建模要求推斷並將環境的轉移動態表示為可執行的程序。先前的研究主要集中在具有豐富交互數據、簡單機制及人類指導的確定性環境上。我們探討了一個更為現實且具挑戰性的場景,即在一個複雜、隨機的環境中學習,其中智能體僅有“一次生命”來探索一個充滿敵意的環境,且無人類指導。我們提出了OneLife框架,該框架通過概率編程框架內條件激活的程序化法則來建模世界動態。每條法則通過前提-效果結構運作,在相關的世界狀態下激活。這構建了一個動態計算圖,僅通過相關法則進行推理和優化,避免了所有法則對複雜層次狀態預測時的規模挑戰,並使得即使在規則激活稀疏的情況下也能學習隨機動態。為了在這些苛刻約束下評估我們的方法,我們引入了一種新的評估協議,該協議衡量(a)狀態排序,即區分可能與不可能未來狀態的能力,以及(b)狀態保真度,即生成與現實高度相似的未來狀態的能力。我們在Crafter-OO上開發並評估了我們的框架,這是我們對Crafter環境的重新實現,它展示了一個結構化的、面向對象的符號狀態以及僅在該狀態上運作的純轉移函數。OneLife能夠從極少且無指導的交互中成功學習關鍵環境動態,在23個測試場景中的16個上超越了強基準。我們還測試了OneLife的規劃能力,模擬推演成功識別了更優策略。我們的工作為自主構建未知複雜環境的程序化世界模型奠定了基礎。
English
Symbolic world modeling requires inferring and representing an environment's transitional dynamics as an executable program. Prior work has focused on largely deterministic environments with abundant interaction data, simple mechanics, and human guidance. We address a more realistic and challenging setting, learning in a complex, stochastic environment where the agent has only "one life" to explore a hostile environment without human guidance. We introduce OneLife, a framework that models world dynamics through conditionally-activated programmatic laws within a probabilistic programming framework. Each law operates through a precondition-effect structure, activating in relevant world states. This creates a dynamic computation graph that routes inference and optimization only through relevant laws, avoiding scaling challenges when all laws contribute to predictions about a complex, hierarchical state, and enabling the learning of stochastic dynamics even with sparse rule activation. To evaluate our approach under these demanding constraints, we introduce a new evaluation protocol that measures (a) state ranking, the ability to distinguish plausible future states from implausible ones, and (b) state fidelity, the ability to generate future states that closely resemble reality. We develop and evaluate our framework on Crafter-OO, our reimplementation of the Crafter environment that exposes a structured, object-oriented symbolic state and a pure transition function that operates on that state alone. OneLife can successfully learn key environment dynamics from minimal, unguided interaction, outperforming a strong baseline on 16 out of 23 scenarios tested. We also test OneLife's planning ability, with simulated rollouts successfully identifying superior strategies. Our work establishes a foundation for autonomously constructing programmatic world models of unknown, complex environments.
PDF42October 15, 2025