ChatPaper.aiChatPaper

基於模型的去偏表徵用於樣本高效的連續控制

Debiased Model-based Representations for Sample-efficient Continuous Control

May 12, 2026
作者: Jiafei Lyu, Zichuan Lin, Scott Fujimoto, Kai Yang, Yangkun Chen, Saiyong Yang, Zongqing Lu, Deheng Ye
cs.AI

摘要

基於模型的表示方法近期成為一種有前景的框架,能將潛在動態信息嵌入表徵中,以應用於離線策略的行動者-評論者學習。該方法隱含地結合了無模型與基於模型方法的優勢,同時避免了基於模型方法所衍生的訓練成本。然而,現有的基於模型表示方法可能無法捕捉足夠的相關變量信息,且容易過度擬合回放緩衝區中的早期經驗。這些因素會導致表徵及行動者-評論者學習產生偏差,進而影響性能表現。為解決此問題,我們提出去偏的基於模型表示Q學習演算法,即DR.Q演算法。DR.Q不僅最小化當前狀態-動作對表徵與下一狀態間的偏差,更明確最大化兩者間的互信息,並透過衰減優先經驗回放機制進行轉移採樣。我們以單一超參數集在多項連續控制基準測試中評估DR.Q,結果顯示該方法能比擬甚至超越近期強基線算法,部分情況下更大幅勝出。我們的程式碼已公開於 https://github.com/dmksjfl/DR.Q。
English
Model-based representations recently stand out as a promising framework that embeds latent dynamics information into the representations for downstream off-policy actor-critic learning. It implicitly combines the advantages of both model-free and model-based approaches while avoiding the training costs associated with model-based methods. Nevertheless, existing model-based representation methods can fail to capture sufficient information about relevant variables and can overfit to early experiences in the replay buffer. These incur biases in representation and actor-critic learning, leading to inferior performance. To address this, we propose Debiased model-based Representations for Q-learning, tagged DR.Q algorithm. DR.Q explicitly maximizes the mutual information between the representations of the current state-action pair and the next state besides minimizing their deviations, and samples transitions with faded prioritized experience replay. We evaluate DR.Q on numerous continuous control benchmarks with a single set of hyperparameters, and the results demonstrate that DR.Q can match or surpass recent strong baselines, sometimes outperforming them by a large margin. Our code is available at https://github.com/dmksjfl/DR.Q.
PDF71May 14, 2026