Trinity-RFT:面向大语言模型强化微调的通用统一框架
Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models
May 23, 2025
作者: Xuchen Pan, Yanxi Chen, Yushuo Chen, Yuchang Sun, Daoyuan Chen, Wenhao Zhang, Yuexiang Xie, Yilun Huang, Yilei Zhang, Dawei Gao, Yaliang Li, Bolin Ding, Jingren Zhou
cs.AI
摘要
Trinity-RFT 是一款通用、灵活且可扩展的框架,专为大规模语言模型的强化微调(RFT)而设计。该框架采用解耦式架构,包含三大核心组件:(1) RFT核心模块,统一并泛化了同步/异步、在线/离线以及策略内/策略外等多种RFT模式;(2) 高效稳健的智能体-环境交互集成机制;(3) 为RFT优化的系统化数据管道。Trinity-RFT 能够轻松适应多样化的应用场景,并作为探索先进强化学习范式的统一平台。本技术报告详细阐述了 Trinity-RFT 的愿景、特性、设计与实现,并通过大量示例展示了该框架的实用性与用户友好性。
English
Trinity-RFT is a general-purpose, flexible and scalable framework designed
for reinforcement fine-tuning (RFT) of large language models. It is built with
a decoupled design, consisting of (1) an RFT-core that unifies and generalizes
synchronous/asynchronous, on-policy/off-policy, and online/offline modes of
RFT, (2) seamless integration for agent-environment interaction with high
efficiency and robustness, and (3) systematic data pipelines optimized for RFT.
Trinity-RFT can be easily adapted for diverse application scenarios, and serves
as a unified platform for exploring advanced reinforcement learning paradigms.
This technical report outlines the vision, features, design and implementations
of Trinity-RFT, accompanied by extensive examples demonstrating the utility and
user-friendliness of the proposed framework.