MALT: マルチエージェントLLMトレーニングによる推論の向上

要旨

LLM間の効果的な協力を可能にすることは、複雑な問題を解決できる自律システムを開発する上で重要なステップです。LLMは通常、単一のモデル生成器として使用されますが、人間が出力を批評し改善する場合があります。共同訓練された協力モデルの潜在能力は、ほとんど未開拓のままです。マルチエージェントのコミュニケーションや議論の設定で有望な結果が得られていますが、モデルをタスクで共同作業させることにはほとんど進展がありませんでした。本論文では、推論問題における「マルチエージェントLLMトレーニング」（MALT）への初めての取り組みを提案します。当該手法は、異種のLLMが専門的な役割を割り当てられた連続的なマルチエージェントセットアップを採用し、問題を反復的に解決する生成器、検証器、および改良モデルを備えています。軌道拡張に基づく合成データ生成プロセスと、共同結果に基づく報酬による信用割り当て戦略を提案しています。これにより、トレーニング後のセットアップが、各モデルの専門能力を自律的に向上させるために、肯定的および否定的な軌跡の両方を利用できるようになります。MATH、GSM8k、およびCQAを対象に当該手法を評価し、Llama 3.1 8BモデルにおけるMALTが、同じベースラインモデルに対してそれぞれ14.14％、7.12％、9.40％の相対的な改善を達成したことを示しました。これは、数学的および常識的な推論問題のパフォーマンスにおけるマルチエージェントの協力能力における初期の進歩を示しています。一般的に、当該研究はマルチエージェントLLMトレーニングアプローチに関する研究の具体的な方向性を提供しています。

English

Enabling effective collaboration among LLMs is a crucial step toward developing autonomous systems capable of solving complex problems. While LLMs are typically used as single-model generators, where humans critique and refine their outputs, the potential for jointly-trained collaborative models remains largely unexplored. Despite promising results in multi-agent communication and debate settings, little progress has been made in training models to work together on tasks. In this paper, we present a first step toward "Multi-agent LLM training" (MALT) on reasoning problems. Our approach employs a sequential multi-agent setup with heterogeneous LLMs assigned specialized roles: a generator, verifier, and refinement model iteratively solving problems. We propose a trajectory-expansion-based synthetic data generation process and a credit assignment strategy driven by joint outcome based rewards. This enables our post-training setup to utilize both positive and negative trajectories to autonomously improve each model's specialized capabilities as part of a joint sequential system. We evaluate our approach across MATH, GSM8k, and CQA, where MALT on Llama 3.1 8B models achieves relative improvements of 14.14%, 7.12%, and 9.40% respectively over the same baseline model. This demonstrates an early advance in multi-agent cooperative capabilities for performance on mathematical and common sense reasoning questions. More generally, our work provides a concrete direction for research around multi-agent LLM training approaches.

MALT: マルチエージェントLLMトレーニングによる推論の向上

MALT: Improving Reasoning with Multi-Agent LLM Training

要旨

Support