TREX：エージェント駆動型ツリーベース探索によるLLMファインチューニングの自動化

要旨

大規模言語モデル（LLM）はAI研究エージェントに個別の科学的タスクを実行する能力を与えたが、LLM学習のような現実世界の複雑なワークフローを自動化することは依然として大きな課題である。本論文では、LLM学習の全ライフサイクルを自動化するマルチエージェントシステム「TREX」を提案する。本システムは、2つのコアモジュール（研究者と実行者）間の協調をオーケストレーションすることで、要件分析、オープンドメインの文献・データ調査、学習戦略の策定、データレシピの準備、モデルの学習と評価をシームレスに実行する。複数回の実験プロセスは探索木としてモデル化され、システムは探索経路の効率的な計画立案、過去結果の再利用、反復試行からの高水準の知見の抽出を可能にする。自動化されたLLM学習の能力を評価するため、基礎的なモデル能力の最適化から特定領域タスクの性能向上まで、現実シナリオに基づく10のタスクから構成されるベンチマーク「FT-Bench」を構築した。実験結果から、TREXエージェントが対象タスクにおいてモデル性能を一貫して最適化できることが示された。

English

While Large Language Models (LLMs) have empowered AI research agents to perform isolated scientific tasks, automating complex, real-world workflows, such as LLM training, remains a significant challenge. In this paper, we introduce TREX, a multi-agent system that automates the entire LLM training life-cycle. By orchestrating collaboration between two core modules-the Researcher and the Executor-the system seamlessly performs requirement analysis, open-domain literature and data research, formulation of training strategies, preparation of data recipes, and model training and evaluation. The multi-round experimental process is modeled as a search tree, enabling the system to efficiently plan exploration paths, reuse historical results, and distill high-level insights from iterative trials. To evaluate the capability of automated LLM training, we construct FT-Bench, a benchmark comprising 10 tasks derived from real-world scenarios, ranging from optimizing fundamental model capabilities to enhancing performance on domain-specific tasks. Experimental results demonstrate that the TREX agent consistently optimizes model performance on target tasks.

TREX：エージェント駆動型ツリーベース探索によるLLMファインチューニングの自動化

TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration

要旨

Support