Trinity-RFT: 대규모 언어 모델의 강화 학습 기반 미세 조정을 위한 범용 통합 프레임워크

초록

Trinity-RFT는 대규모 언어 모델의 강화 미세 조정(Reinforcement Fine-Tuning, RFT)을 위해 설계된 범용적이고 유연하며 확장 가능한 프레임워크입니다. 이 프레임워크는 분리된 설계로 구성되어 있으며, (1) 동기/비동기, 온-정책/오프-정책, 온라인/오프라인 모드의 RFT를 통합하고 일반화하는 RFT 코어, (2) 에이전트-환경 상호작용을 위한 고효율 및 강건성을 갖춘 원활한 통합, (3) RFT에 최적화된 체계적인 데이터 파이프라인을 포함합니다. Trinity-RFT는 다양한 응용 시나리오에 쉽게 적용할 수 있으며, 고급 강화 학습 패러다임을 탐구하기 위한 통합 플랫폼 역할을 합니다. 이 기술 보고서는 Trinity-RFT의 비전, 기능, 설계 및 구현을 개괄하며, 제안된 프레임워크의 유용성과 사용자 친화성을 입증하는 다양한 예시를 제공합니다.

English

Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models. It is built with a decoupled design, consisting of (1) an RFT-core that unifies and generalizes synchronous/asynchronous, on-policy/off-policy, and online/offline modes of RFT, (2) seamless integration for agent-environment interaction with high efficiency and robustness, and (3) systematic data pipelines optimized for RFT. Trinity-RFT can be easily adapted for diverse application scenarios, and serves as a unified platform for exploring advanced reinforcement learning paradigms. This technical report outlines the vision, features, design and implementations of Trinity-RFT, accompanied by extensive examples demonstrating the utility and user-friendliness of the proposed framework.

Trinity-RFT: 대규모 언어 모델의 강화 학습 기반 미세 조정을 위한 범용 통합 프레임워크

Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models

초록

Support