Pearl：一个可投入生产的强化学习代理

摘要

强化学习（RL）提供了一个多功能框架来实现长期目标。其通用性使我们能够形式化一系列现实世界智能系统遇到的问题，例如处理延迟奖励、处理部分可观测性、解决探索与利用之间的困境、利用离线数据来改善在线性能，并确保满足安全约束。尽管强化学习研究界在解决这些问题方面取得了相当大的进展，但现有的开源强化学习库往往集中在强化学习解决方案流程的一小部分，而其他方面则大多被忽视。本文介绍了Pearl，一个可用于生产的强化学习代理软件包，专门设计为以模块化方式应对这些挑战。除了展示初步基准结果外，本文还强调了Pearl在工业界的采用，以展示其适用于生产使用的准备情况。Pearl在Github上开源，网址为github.com/facebookresearch/pearl，官方网站位于pearlagent.github.io。

English

Reinforcement Learning (RL) offers a versatile framework for achieving long-term goals. Its generality allows us to formalize a wide range of problems that real-world intelligent systems encounter, such as dealing with delayed rewards, handling partial observability, addressing the exploration and exploitation dilemma, utilizing offline data to improve online performance, and ensuring safety constraints are met. Despite considerable progress made by the RL research community in addressing these issues, existing open-source RL libraries tend to focus on a narrow portion of the RL solution pipeline, leaving other aspects largely unattended. This paper introduces Pearl, a Production-ready RL agent software package explicitly designed to embrace these challenges in a modular fashion. In addition to presenting preliminary benchmark results, this paper highlights Pearl's industry adoptions to demonstrate its readiness for production usage. Pearl is open sourced on Github at github.com/facebookresearch/pearl and its official website is located at pearlagent.github.io.

Pearl：一个可投入生产的强化学习代理

Pearl: A Production-ready Reinforcement Learning Agent

摘要

Support