Trinity-RFT：一個通用且統一的強化微調框架，適用於大型語言模型

摘要

Trinity-RFT 是一個通用、靈活且可擴展的框架，專為大型語言模型的強化微調（RFT）而設計。它採用解耦式架構，包含：(1) 一個 RFT 核心，統一並泛化了同步/異步、在線/離線以及同策略/異策略的 RFT 模式；(2) 高效且穩健的智能體-環境交互無縫集成；(3) 針對 RFT 優化的系統化數據管道。Trinity-RFT 能輕鬆適應多樣化的應用場景，並作為探索先進強化學習範式的統一平台。本技術報告概述了 Trinity-RFT 的願景、特性、設計與實現，並附有大量示例，展示了該框架的實用性與用戶友好性。

English

Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models. It is built with a decoupled design, consisting of (1) an RFT-core that unifies and generalizes synchronous/asynchronous, on-policy/off-policy, and online/offline modes of RFT, (2) seamless integration for agent-environment interaction with high efficiency and robustness, and (3) systematic data pipelines optimized for RFT. Trinity-RFT can be easily adapted for diverse application scenarios, and serves as a unified platform for exploring advanced reinforcement learning paradigms. This technical report outlines the vision, features, design and implementations of Trinity-RFT, accompanied by extensive examples demonstrating the utility and user-friendliness of the proposed framework.

Trinity-RFT：一個通用且統一的強化微調框架，適用於大型語言模型

Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models

摘要

Support