OpenTinker：智能体强化学习中的关注点分离

摘要

我们推出OpenTinker——一个围绕算法设计、执行与智能体-环境交互分离原则构建的大语言模型智能体强化学习基础设施。该框架摒弃了传统的端到端强化学习流水线，将智能体学习系统解耦为具有明确定义抽象边界的轻量级可组合模块。用户只需定义智能体、环境及交互协议，而推理与训练任务则交由托管执行运行时处理。OpenTinker创新性地引入中央调度器，可在共享资源上统一管理基于LoRA和全参数的强化学习、监督微调及推理等任务。我们进一步探讨了将框架扩展至多智能体训练的设计原则，并通过一系列强化学习应用案例，验证了该框架在实际智能体学习场景中的有效性。

English

We introduce OpenTinker, an infrastructure for reinforcement learning (RL) of large language model (LLM) agents built around a separation of concerns across algorithm design, execution, and agent-environment interaction. Rather than relying on monolithic, end-to-end RL pipelines, OpenTinker decomposes agentic learning systems into lightweight, composable components with clearly defined abstraction boundaries. Users specify agents, environments, and interaction protocols, while inference and training are delegated to a managed execution runtime. OpenTinker introduces a centralized scheduler for managing training and inference workloads, including LoRA-based and full-parameter RL, supervised fine-tuning, and inference, over shared resources. We further discuss design principles for extending OpenTinker to multi-agent training. Finally, we present a set of RL use cases that demonstrate the effectiveness of the framework in practical agentic learning scenarios.

OpenTinker：智能体强化学习中的关注点分离

OpenTinker: Separating Concerns in Agentic Reinforcement Learning

摘要

Support