ChatPaper.aiChatPaper

EBT策略:能量解鎖湧現的物理推理能力

EBT-Policy: Energy Unlocks Emergent Physical Reasoning Capabilities

October 31, 2025
作者: Travis Davies, Yiqi Huang, Alexi Gladstone, Yunxin Liu, Xiang Chen, Heng Ji, Huxian Liu, Luhui Hu
cs.AI

摘要

以生成模型(如擴散策略)為參數的隱式策略,已成為機器人領域中策略學習與視覺-語言-動作模型的主流方法。然而,這類方法常面臨計算成本高、曝光偏差及推理動態不穩定等問題,導致在分佈偏移下出現發散現象。基於能量的模型通過端到端學習能量景觀並建模平衡動力學,能有效改善魯棒性並降低曝光偏差,但傳統上以EBM參數化的策略難以擴展至複雜場景。近期提出的能量型Transformer證明了EBM在高維空間的可擴展性,但其在物理實體模型核心挑戰中的應用潛力尚未充分探索。我們提出新型能量架構EBT-Policy,專注解決機器人與真實世界環境的關鍵問題。在仿真與實物任務中,EBT-Policy不僅持續超越基於擴散的策略,更顯著降低訓練與推理計算量——部分任務僅需兩次推理步驟即可收斂,較擴散策略的100步實現50倍壓縮。尤為突出的是,EBT-Policy展現出前所未有的湧現能力:僅通過行為克隆且無需重試訓練,即可實現失敗動作序列的零樣本恢復。通過利用標量能量進行不確定性感知推理與動態計算資源分配,EBT-Policy為分佈偏移下實現魯棒、可泛化的機器人行為開闢了新路徑。
English
Implicit policies parameterized by generative models, such as Diffusion Policy, have become the standard for policy learning and Vision-Language-Action (VLA) models in robotics. However, these approaches often suffer from high computational cost, exposure bias, and unstable inference dynamics, which lead to divergence under distribution shifts. Energy-Based Models (EBMs) address these issues by learning energy landscapes end-to-end and modeling equilibrium dynamics, offering improved robustness and reduced exposure bias. Yet, policies parameterized by EBMs have historically struggled to scale effectively. Recent work on Energy-Based Transformers (EBTs) demonstrates the scalability of EBMs to high-dimensional spaces, but their potential for solving core challenges in physically embodied models remains underexplored. We introduce a new energy-based architecture, EBT-Policy, that solves core issues in robotic and real-world settings. Across simulated and real-world tasks, EBT-Policy consistently outperforms diffusion-based policies, while requiring less training and inference computation. Remarkably, on some tasks it converges within just two inference steps, a 50x reduction compared to Diffusion Policy's 100. Moreover, EBT-Policy exhibits emergent capabilities not seen in prior models, such as zero-shot recovery from failed action sequences using only behavior cloning and without explicit retry training. By leveraging its scalar energy for uncertainty-aware inference and dynamic compute allocation, EBT-Policy offers a promising path toward robust, generalizable robot behavior under distribution shifts.
PDF493January 19, 2026