Trinity-RFT：面向大语言模型强化微调的通用统一框架

摘要

Trinity-RFT 是一款通用、灵活且可扩展的框架，专为大规模语言模型的强化微调（RFT）而设计。该框架采用解耦式架构，包含三大核心组件：(1) RFT核心模块，统一并泛化了同步/异步、在线/离线以及策略内/策略外等多种RFT模式；(2) 高效稳健的智能体-环境交互集成机制；(3) 为RFT优化的系统化数据管道。Trinity-RFT 能够轻松适应多样化的应用场景，并作为探索先进强化学习范式的统一平台。本技术报告详细阐述了 Trinity-RFT 的愿景、特性、设计与实现，并通过大量示例展示了该框架的实用性与用户友好性。

English

Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models. It is built with a decoupled design, consisting of (1) an RFT-core that unifies and generalizes synchronous/asynchronous, on-policy/off-policy, and online/offline modes of RFT, (2) seamless integration for agent-environment interaction with high efficiency and robustness, and (3) systematic data pipelines optimized for RFT. Trinity-RFT can be easily adapted for diverse application scenarios, and serves as a unified platform for exploring advanced reinforcement learning paradigms. This technical report outlines the vision, features, design and implementations of Trinity-RFT, accompanied by extensive examples demonstrating the utility and user-friendliness of the proposed framework.

Trinity-RFT：面向大语言模型强化微调的通用统一框架

Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models

摘要

Support