オプティマ：LLMベースのマルチエージェントシステムの効果と効率の最適化

要旨

大規模言語モデル（LLM）ベースのマルチエージェントシステム（MAS）は、協力的な問題解決において顕著な潜在能力を示していますが、依然として重要な課題に直面しています：低い通信効率、スケーラビリティの低さ、および効果的なパラメータ更新最適化手法の不足です。本研究では、これらの問題に取り組む新しい枠組みであるOptimaを提案します。Optimaは、LLMトレーニングを通じてLLMベースのMASにおいて通信効率とタスク効果を著しく向上させることでこれらの問題に対処します。Optimaは、タスクのパフォーマンス、トークン効率、および通信の可読性をバランスする報酬関数を用いた反復的な生成、ランク付け、選択、トレーニングのパラダイムを採用しています。我々は、Supervised Fine-Tuning、Direct Preference OptimizationなどのさまざまなRLアルゴリズムを探究し、それらの有効性と効率性のトレードオフに関する洞察を提供します。DPOデータ生成のためにMonte Carlo Tree Searchに着想を得た手法を統合し、会話のターンを木のノードとして扱い、多様な相互作用経路を探索します。情報非対称な質問応答や複雑な推論を含む一般的なマルチエージェントタスクで評価した結果、Optimaは、情報交換が多いタスクにおいて最大2.8倍のパフォーマンス向上を実現し、トークン数が10％未満であることを示しました。さらに、Optimaの効率的な利点は、推論コンピュートをより効果的に活用する新たな可能性を切り開き、改善された推論時間のスケーリング則につながります。LLMベースのMASにおける基本的な課題に取り組むことで、Optimaはスケーラブルで効率的かつ効果的なMASへの潜在性を示しています。

English

Large Language Model (LLM) based multi-agent systems (MAS) show remarkable potential in collaborative problem-solving, yet they still face critical challenges: low communication efficiency, poor scalability, and a lack of effective parameter-updating optimization methods. We present Optima, a novel framework that addresses these issues by significantly enhancing both communication efficiency and task effectiveness in LLM-based MAS through LLM training. Optima employs an iterative generate, rank, select, and train paradigm with a reward function balancing task performance, token efficiency, and communication readability. We explore various RL algorithms, including Supervised Fine-Tuning, Direct Preference Optimization, and their hybrid approaches, providing insights into their effectiveness-efficiency trade-offs. We integrate Monte Carlo Tree Search-inspired techniques for DPO data generation, treating conversation turns as tree nodes to explore diverse interaction paths. Evaluated on common multi-agent tasks, including information-asymmetric question answering and complex reasoning, Optima shows consistent and substantial improvements over single-agent baselines and vanilla MAS based on Llama 3 8B, achieving up to 2.8x performance gain with less than 10\% tokens on tasks requiring heavy information exchange. Moreover, Optima's efficiency gains open new possibilities for leveraging inference-compute more effectively, leading to improved inference-time scaling laws. By addressing fundamental challenges in LLM-based MAS, Optima shows the potential towards scalable, efficient, and effective MAS (https://chenweize1998.github.io/optima-project-page).

オプティマ：LLMベースのマルチエージェントシステムの効果と効率の最適化

Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System

要旨

Support