OpenTinker: 에이전트 강화 학습에서의 관심사 분리

초록

우리는 알고리즘 설계, 실행, 에이전트-환경 상호작용을 관심사별로 분리하여 구축한 대규모 언어 모델(LLM) 에이전트의 강화 학습(RL) 인프라인 OpenTinker를 소개한다. OpenTinker는 단일화된 종단간(end-to-end) RL 파이프라인에 의존하기보다 에이전트 학습 시스템을 명확한 추상화 경계를 가진 경량의 구성 가능 컴포넌트로 분해한다. 사용자는 에이전트, 환경, 상호작용 프로토콜을 지정하는 반면, 추론 및 학습은 관리형 실행 런타임에 위임된다. OpenTinker는 공유 자원을 통해 LoRA 기반 및 전체 파라미터 RL, 지도 미세 조정, 추론을 포함한 학습 및 추론 워크로드를 관리하기 위한 중앙 집중식 스케줄러를 도입한다. 또한 다중 에이전트 학습으로 OpenTinker를 확장하기 위한 설계 원칙에 대해 논의한다. 마지막으로 실제 에이전트 학습 시나리오에서 본 프레임워크의 효과를 입증하는 일련의 RL 사용 사례를 제시한다.

English

We introduce OpenTinker, an infrastructure for reinforcement learning (RL) of large language model (LLM) agents built around a separation of concerns across algorithm design, execution, and agent-environment interaction. Rather than relying on monolithic, end-to-end RL pipelines, OpenTinker decomposes agentic learning systems into lightweight, composable components with clearly defined abstraction boundaries. Users specify agents, environments, and interaction protocols, while inference and training are delegated to a managed execution runtime. OpenTinker introduces a centralized scheduler for managing training and inference workloads, including LoRA-based and full-parameter RL, supervised fine-tuning, and inference, over shared resources. We further discuss design principles for extending OpenTinker to multi-agent training. Finally, we present a set of RL use cases that demonstrate the effectiveness of the framework in practical agentic learning scenarios.

OpenTinker: 에이전트 강화 학습에서의 관심사 분리

OpenTinker: Separating Concerns in Agentic Reinforcement Learning

초록

Support