EBT策略:能量解鎖湧現的物理推理能力
EBT-Policy: Energy Unlocks Emergent Physical Reasoning Capabilities
October 31, 2025
作者: Travis Davies, Yiqi Huang, Alexi Gladstone, Yunxin Liu, Xiang Chen, Heng Ji, Huxian Liu, Luhui Hu
cs.AI
摘要
以生成模型(如擴散策略)為參數的隱式策略,已成為機器人領域中策略學習與視覺-語言-動作模型的主流方法。然而,這類方法常面臨計算成本高、曝光偏差及推理動態不穩定等問題,導致在分佈偏移下出現發散現象。基於能量的模型通過端到端學習能量景觀並建模平衡動力學,能有效改善魯棒性並降低曝光偏差,但傳統上以EBM參數化的策略難以擴展至複雜場景。近期提出的能量型Transformer證明了EBM在高維空間的可擴展性,但其在物理實體模型核心挑戰中的應用潛力尚未充分探索。我們提出新型能量架構EBT-Policy,專注解決機器人與真實世界環境的關鍵問題。在仿真與實物任務中,EBT-Policy不僅持續超越基於擴散的策略,更顯著降低訓練與推理計算量——部分任務僅需兩次推理步驟即可收斂,較擴散策略的100步實現50倍壓縮。尤為突出的是,EBT-Policy展現出前所未有的湧現能力:僅通過行為克隆且無需重試訓練,即可實現失敗動作序列的零樣本恢復。通過利用標量能量進行不確定性感知推理與動態計算資源分配,EBT-Policy為分佈偏移下實現魯棒、可泛化的機器人行為開闢了新路徑。
English
Implicit policies parameterized by generative models, such as Diffusion
Policy, have become the standard for policy learning and Vision-Language-Action
(VLA) models in robotics. However, these approaches often suffer from high
computational cost, exposure bias, and unstable inference dynamics, which lead
to divergence under distribution shifts. Energy-Based Models (EBMs) address
these issues by learning energy landscapes end-to-end and modeling equilibrium
dynamics, offering improved robustness and reduced exposure bias. Yet, policies
parameterized by EBMs have historically struggled to scale effectively. Recent
work on Energy-Based Transformers (EBTs) demonstrates the scalability of EBMs
to high-dimensional spaces, but their potential for solving core challenges in
physically embodied models remains underexplored. We introduce a new
energy-based architecture, EBT-Policy, that solves core issues in robotic and
real-world settings. Across simulated and real-world tasks, EBT-Policy
consistently outperforms diffusion-based policies, while requiring less
training and inference computation. Remarkably, on some tasks it converges
within just two inference steps, a 50x reduction compared to Diffusion Policy's
100. Moreover, EBT-Policy exhibits emergent capabilities not seen in prior
models, such as zero-shot recovery from failed action sequences using only
behavior cloning and without explicit retry training. By leveraging its scalar
energy for uncertainty-aware inference and dynamic compute allocation,
EBT-Policy offers a promising path toward robust, generalizable robot behavior
under distribution shifts.