Trinity-RFT: 大規模言語モデルの強化学習ファインチューニングのための汎用統合フレームワーク

要旨

Trinity-RFTは、大規模言語モデルの強化学習ファインチューニング（RFT）向けに設計された、汎用的で柔軟かつスケーラブルなフレームワークです。分離設計を採用しており、(1) 同期/非同期、オンライン/オフライン、オン・ポリシー/オフ・ポリシーといったRFTモードを統合・一般化するRFTコア、(2) エージェントと環境の相互作用を高効率かつ堅牢にシームレスに統合する機能、(3) RFT向けに最適化された体系的なデータパイプライン、の3つの主要コンポーネントで構成されています。Trinity-RFTは多様なアプリケーションシナリオに容易に適応可能であり、先進的な強化学習パラダイムを探求するための統一プラットフォームとして機能します。本技術レポートでは、Trinity-RFTのビジョン、特徴、設計、実装を概説し、提案フレームワークの有用性とユーザーフレンドリー性を示す豊富な例を提供します。

English

Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models. It is built with a decoupled design, consisting of (1) an RFT-core that unifies and generalizes synchronous/asynchronous, on-policy/off-policy, and online/offline modes of RFT, (2) seamless integration for agent-environment interaction with high efficiency and robustness, and (3) systematic data pipelines optimized for RFT. Trinity-RFT can be easily adapted for diverse application scenarios, and serves as a unified platform for exploring advanced reinforcement learning paradigms. This technical report outlines the vision, features, design and implementations of Trinity-RFT, accompanied by extensive examples demonstrating the utility and user-friendliness of the proposed framework.

Trinity-RFT: 大規模言語モデルの強化学習ファインチューニングのための汎用統合フレームワーク

Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models

要旨

Support