OpenTinker: エージェント的強化学習における関心の分離

要旨

本論文では、大規模言語モデル（LLM）エージェントの強化学習（RL）のための基盤「OpenTinker」を提案する。本基盤は、アルゴリズム設計、実行、エージェントと環境の相互作用という関心の分離を中核に据えている。モノリシックなエンドツーエンドのRLパイプラインに依存するのではなく、OpenTinkerはエージェント学習システムを、明確に定義された抽象化境界を持つ軽量で構成可能なコンポーネントへと分解する。ユーザーはエージェント、環境、および相互作用プロトコルを指定し、推論と訓練は管理された実行ランタイムに委任される。OpenTinkerは、共有リソース上での訓練と推論のワークロード（LoRAベースおよび全パラメータRL、教師ありファインチューニング、推論を含む）を管理するための集中型スケジューラを導入する。さらに、OpenTinkerをマルチエージェント訓練に拡張するための設計原則について論じる。最後に、実践的なエージェント学習シナリオにおいて本フレームワークの有効性を示す一連のRLユースケースを提示する。

English

We introduce OpenTinker, an infrastructure for reinforcement learning (RL) of large language model (LLM) agents built around a separation of concerns across algorithm design, execution, and agent-environment interaction. Rather than relying on monolithic, end-to-end RL pipelines, OpenTinker decomposes agentic learning systems into lightweight, composable components with clearly defined abstraction boundaries. Users specify agents, environments, and interaction protocols, while inference and training are delegated to a managed execution runtime. OpenTinker introduces a centralized scheduler for managing training and inference workloads, including LoRA-based and full-parameter RL, supervised fine-tuning, and inference, over shared resources. We further discuss design principles for extending OpenTinker to multi-agent training. Finally, we present a set of RL use cases that demonstrate the effectiveness of the framework in practical agentic learning scenarios.

OpenTinker: エージェント的強化学習における関心の分離

OpenTinker: Separating Concerns in Agentic Reinforcement Learning

要旨

Support