優化:針對基於LLM的多智能體系統的效能和效率進行優化
Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System
October 10, 2024
作者: Weize Chen, Jiarui Yuan, Chen Qian, Cheng Yang, Zhiyuan Liu, Maosong Sun
cs.AI
摘要
基於大型語言模型(LLM)的多智能體系統(MAS)在協作解決問題方面展現出卓越潛力,但仍面臨著關鍵挑戰:低通訊效率、可擴展性不佳以及缺乏有效的參數更新優化方法。我們提出了Optima,一個新穎的框架,通過顯著增強LLM訓練中的通訊效率和任務效能,來應對這些問題。Optima採用了一種迭代生成、排名、選擇和訓練範式,並通過平衡任務表現、標記效率和通訊可讀性的獎勵函數,顯著提高了LLM-based MAS中的通訊效率和任務效能。我們探索了各種強化學習算法,包括監督微調、直接偏好優化以及它們的混合方法,提供了對其效能和效率取捨的見解。我們整合了受蒙特卡羅樹搜索啟發的技術用於DPO數據生成,將對話轉換視為樹節點以探索多樣的互動路徑。在常見的多智能體任務上進行評估,包括信息不對稱問答和複雜推理,Optima相對於基於Llama 3 8B的單智能體基線和基本MAS表現出一致且顯著的改進,在需要大量信息交換的任務上實現了高達2.8倍的性能增益,並且在不到10%的標記情況下。此外,Optima的效率提升開啟了更有效地利用推論計算的新可能性,從而帶來了改進的推論時間擴展定律。通過解決LLM-based MAS中的基本挑戰,Optima展示了實現可擴展、高效和有效MAS的潛力(https://chenweize1998.github.io/optima-project-page)。
English
Large Language Model (LLM) based multi-agent systems (MAS) show remarkable
potential in collaborative problem-solving, yet they still face critical
challenges: low communication efficiency, poor scalability, and a lack of
effective parameter-updating optimization methods. We present Optima, a novel
framework that addresses these issues by significantly enhancing both
communication efficiency and task effectiveness in LLM-based MAS through LLM
training. Optima employs an iterative generate, rank, select, and train
paradigm with a reward function balancing task performance, token efficiency,
and communication readability. We explore various RL algorithms, including
Supervised Fine-Tuning, Direct Preference Optimization, and their hybrid
approaches, providing insights into their effectiveness-efficiency trade-offs.
We integrate Monte Carlo Tree Search-inspired techniques for DPO data
generation, treating conversation turns as tree nodes to explore diverse
interaction paths. Evaluated on common multi-agent tasks, including
information-asymmetric question answering and complex reasoning, Optima shows
consistent and substantial improvements over single-agent baselines and vanilla
MAS based on Llama 3 8B, achieving up to 2.8x performance gain with less than
10\% tokens on tasks requiring heavy information exchange. Moreover, Optima's
efficiency gains open new possibilities for leveraging inference-compute more
effectively, leading to improved inference-time scaling laws. By addressing
fundamental challenges in LLM-based MAS, Optima shows the potential towards
scalable, efficient, and effective MAS
(https://chenweize1998.github.io/optima-project-page).Summary
AI-Generated Summary