物理信息粘性价值表示

摘要

离线目标条件强化学习（GCRL）通过静态预收集数据集学习目标条件策略。然而，由于状态-动作空间的有限覆盖，精确的价值估计仍面临挑战。近期物理启发方法尝试通过在一阶偏微分方程（如程函方程）上定义正则化项，对价值函数施加物理和几何约束以解决该问题。但这些公式在复杂高维环境中往往不适定。本研究提出一种基于汉密尔顿-雅可比-贝尔曼（HJB）方程粘性解的物理启发正则化方法。通过引入物理驱动的归纳偏置，我们的方法将学习过程锚定在最优控制理论中，显式规范并限制价值迭代中的更新幅度。进一步地，我们利用费曼-卡茨定理将偏微分方程解重构为期望形式，实现了目标的蒙特卡洛估计，避免了高阶梯度中的数值不稳定问题。实验表明，该方法能有效提升几何一致性，可广泛应用于导航任务及高维复杂操作任务。开源代码详见：https://github.com/HrishikeshVish/phys-fk-value-GCRL。

English

Offline goal-conditioned reinforcement learning (GCRL) learns goal-conditioned policies from static pre-collected datasets. However, accurate value estimation remains a challenge due to the limited coverage of the state-action space. Recent physics-informed approaches have sought to address this by imposing physical and geometric constraints on the value function through regularization defined over first-order partial differential equations (PDEs), such as the Eikonal equation. However, these formulations can often be ill-posed in complex, high-dimensional environments. In this work, we propose a physics-informed regularization derived from the viscosity solution of the Hamilton-Jacobi-Bellman (HJB) equation. By providing a physics-based inductive bias, our approach grounds the learning process in optimal control theory, explicitly regularizing and bounding updates during value iterations. Furthermore, we leverage the Feynman-Kac theorem to recast the PDE solution as an expectation, enabling a tractable Monte Carlo estimation of the objective that avoids numerical instability in higher-order gradients. Experiments demonstrate that our method improves geometric consistency, making it broadly applicable to navigation and high-dimensional, complex manipulation tasks. Open-source codes are available at https://github.com/HrishikeshVish/phys-fk-value-GCRL.

物理信息粘性价值表示

Physics Informed Viscous Value Representations

摘要

Support