物理情報に基づく粘性価値表現

要旨

オフライン目標条件付き強化学習（GCRL）は、事前に収集された静的なデータセットから目標条件付き方策を学習する。しかし、状態-行動空間の限られたカバレッジにより、正確な価値推定は依然として課題である。近年の物理情報に基づくアプローチは、アイコナール方程式のような一階偏微分方程式（PDE）上で定義された正則化を通じて、価値関数に物理的・幾何学的制約を課すことでこの問題に対処しようとしている。しかし、これらの定式化は、複雑で高次元の環境では不適切問題となり得る。本研究では、ハミルトン-ヤコビ-ベルマン（HJB）方程式の粘性解から導出された物理情報正則化を提案する。物理に基づく帰納バイアスを提供することにより、本手法は学習プロセスを最適制御理論に根ざさせ、価値反復中の更新を明示的に正則化し境界付ける。さらに、ファインマン-カッツの定理を活用してPDEの解を期待値として再構成し、高次勾配における数値的不安定性を回避する、扱いやすいモンテカルロ推定を可能にする。実験により、本手法が幾何学的整合性を改善し、ナビゲーションや高次元の複雑なマニピュレーションタスクに広く適用可能であることを示す。オープンソースコードは https://github.com/HrishikeshVish/phys-fk-value-GCRL で公開されている。

English

Offline goal-conditioned reinforcement learning (GCRL) learns goal-conditioned policies from static pre-collected datasets. However, accurate value estimation remains a challenge due to the limited coverage of the state-action space. Recent physics-informed approaches have sought to address this by imposing physical and geometric constraints on the value function through regularization defined over first-order partial differential equations (PDEs), such as the Eikonal equation. However, these formulations can often be ill-posed in complex, high-dimensional environments. In this work, we propose a physics-informed regularization derived from the viscosity solution of the Hamilton-Jacobi-Bellman (HJB) equation. By providing a physics-based inductive bias, our approach grounds the learning process in optimal control theory, explicitly regularizing and bounding updates during value iterations. Furthermore, we leverage the Feynman-Kac theorem to recast the PDE solution as an expectation, enabling a tractable Monte Carlo estimation of the objective that avoids numerical instability in higher-order gradients. Experiments demonstrate that our method improves geometric consistency, making it broadly applicable to navigation and high-dimensional, complex manipulation tasks. Open-source codes are available at https://github.com/HrishikeshVish/phys-fk-value-GCRL.

物理情報に基づく粘性価値表現

Physics Informed Viscous Value Representations

要旨

Support