物理信息粘性价值表征

摘要

離線目標條件強化學習（GCRL）通過靜態預收集數據集學習目標條件策略。然而，由於狀態-動作空間的覆蓋範圍有限，精確的價值估計仍是挑戰。近期物理信息方法嘗試通過一階偏微分方程（如程函方程）正則化對價值函數施加物理和幾何約束，但這類公式在複雜高維環境中常出現不適定性。本文提出源自哈密頓-雅可比-貝爾曼（HJB）方程黏性解的物理信息正則化方法，通過物理學歸納偏置將學習過程錨定於最優控制理論，顯式正則化並限制價值迭代中的更新幅度。進一步基於費曼-卡茨定理將偏微分方程解重構為期望形式，實現可蒙特卡洛估計的目標函數，避免高階梯度數值不穩定性。實驗表明該方法能提升幾何一致性，可廣泛應用於導航任務及高維複雜操作任務。開原始碼見 https://github.com/HrishikeshVish/phys-fk-value-GCRL。

English

Offline goal-conditioned reinforcement learning (GCRL) learns goal-conditioned policies from static pre-collected datasets. However, accurate value estimation remains a challenge due to the limited coverage of the state-action space. Recent physics-informed approaches have sought to address this by imposing physical and geometric constraints on the value function through regularization defined over first-order partial differential equations (PDEs), such as the Eikonal equation. However, these formulations can often be ill-posed in complex, high-dimensional environments. In this work, we propose a physics-informed regularization derived from the viscosity solution of the Hamilton-Jacobi-Bellman (HJB) equation. By providing a physics-based inductive bias, our approach grounds the learning process in optimal control theory, explicitly regularizing and bounding updates during value iterations. Furthermore, we leverage the Feynman-Kac theorem to recast the PDE solution as an expectation, enabling a tractable Monte Carlo estimation of the objective that avoids numerical instability in higher-order gradients. Experiments demonstrate that our method improves geometric consistency, making it broadly applicable to navigation and high-dimensional, complex manipulation tasks. Open-source codes are available at https://github.com/HrishikeshVish/phys-fk-value-GCRL.

物理信息粘性价值表征

Physics Informed Viscous Value Representations

摘要

Support