EBT策略:能量解锁涌现式物理推理能力
EBT-Policy: Energy Unlocks Emergent Physical Reasoning Capabilities
October 31, 2025
作者: Travis Davies, Yiqi Huang, Alexi Gladstone, Yunxin Liu, Xiang Chen, Heng Ji, Huxian Liu, Luhui Hu
cs.AI
摘要
由生成模型参数化的隐式策略,如扩散策略,已成为机器人领域策略学习和视觉-语言-动作模型的标准范式。然而,这类方法常面临计算成本高、暴露偏差和推理动态不稳定等问题,导致在分布偏移下出现策略发散。基于能量的模型通过端到端学习能量景观并建模平衡动力学,有效改善了鲁棒性并减少暴露偏差,但基于能量的策略参数化方法长期以来难以有效扩展。近期基于能量的变换器研究证明了该类模型在高维空间的扩展能力,但其在物理实体模型中解决核心挑战的潜力尚未得到充分探索。我们提出新型能量架构EBT策略,成功解决了机器人和现实场景中的核心问题。在仿真与真实任务中,EBT策略始终优于基于扩散的策略,同时所需训练和推理计算量更少。值得注意的是,在某些任务中仅需两次推理步骤即可收敛,相较扩散策略的100步实现了50倍缩减。更引人注目的是,EBT策略展现出前所未有的涌现能力,例如仅通过行为克隆而无需显式重试训练,即可实现失败动作序列的零样本恢复。通过利用其标量能量进行不确定性感知推理和动态计算分配,EBT策略为分布偏移下实现鲁棒、可泛化的机器人行为提供了可行路径。
English
Implicit policies parameterized by generative models, such as Diffusion
Policy, have become the standard for policy learning and Vision-Language-Action
(VLA) models in robotics. However, these approaches often suffer from high
computational cost, exposure bias, and unstable inference dynamics, which lead
to divergence under distribution shifts. Energy-Based Models (EBMs) address
these issues by learning energy landscapes end-to-end and modeling equilibrium
dynamics, offering improved robustness and reduced exposure bias. Yet, policies
parameterized by EBMs have historically struggled to scale effectively. Recent
work on Energy-Based Transformers (EBTs) demonstrates the scalability of EBMs
to high-dimensional spaces, but their potential for solving core challenges in
physically embodied models remains underexplored. We introduce a new
energy-based architecture, EBT-Policy, that solves core issues in robotic and
real-world settings. Across simulated and real-world tasks, EBT-Policy
consistently outperforms diffusion-based policies, while requiring less
training and inference computation. Remarkably, on some tasks it converges
within just two inference steps, a 50x reduction compared to Diffusion Policy's
100. Moreover, EBT-Policy exhibits emergent capabilities not seen in prior
models, such as zero-shot recovery from failed action sequences using only
behavior cloning and without explicit retry training. By leveraging its scalar
energy for uncertainty-aware inference and dynamic compute allocation,
EBT-Policy offers a promising path toward robust, generalizable robot behavior
under distribution shifts.