優化：針對基於LLM的多智能體系統的效能和效率進行優化

摘要

基於大型語言模型（LLM）的多智能體系統（MAS）在協作解決問題方面展現出卓越潛力，但仍面臨著關鍵挑戰：低通訊效率、可擴展性不佳以及缺乏有效的參數更新優化方法。我們提出了Optima，一個新穎的框架，通過顯著增強LLM訓練中的通訊效率和任務效能，來應對這些問題。Optima採用了一種迭代生成、排名、選擇和訓練範式，並通過平衡任務表現、標記效率和通訊可讀性的獎勵函數，顯著提高了LLM-based MAS中的通訊效率和任務效能。我們探索了各種強化學習算法，包括監督微調、直接偏好優化以及它們的混合方法，提供了對其效能和效率取捨的見解。我們整合了受蒙特卡羅樹搜索啟發的技術用於DPO數據生成，將對話轉換視為樹節點以探索多樣的互動路徑。在常見的多智能體任務上進行評估，包括信息不對稱問答和複雜推理，Optima相對於基於Llama 3 8B的單智能體基線和基本MAS表現出一致且顯著的改進，在需要大量信息交換的任務上實現了高達2.8倍的性能增益，並且在不到10％的標記情況下。此外，Optima的效率提升開啟了更有效地利用推論計算的新可能性，從而帶來了改進的推論時間擴展定律。通過解決LLM-based MAS中的基本挑戰，Optima展示了實現可擴展、高效和有效MAS的潛力（https://chenweize1998.github.io/optima-project-page）。

English

Large Language Model (LLM) based multi-agent systems (MAS) show remarkable potential in collaborative problem-solving, yet they still face critical challenges: low communication efficiency, poor scalability, and a lack of effective parameter-updating optimization methods. We present Optima, a novel framework that addresses these issues by significantly enhancing both communication efficiency and task effectiveness in LLM-based MAS through LLM training. Optima employs an iterative generate, rank, select, and train paradigm with a reward function balancing task performance, token efficiency, and communication readability. We explore various RL algorithms, including Supervised Fine-Tuning, Direct Preference Optimization, and their hybrid approaches, providing insights into their effectiveness-efficiency trade-offs. We integrate Monte Carlo Tree Search-inspired techniques for DPO data generation, treating conversation turns as tree nodes to explore diverse interaction paths. Evaluated on common multi-agent tasks, including information-asymmetric question answering and complex reasoning, Optima shows consistent and substantial improvements over single-agent baselines and vanilla MAS based on Llama 3 8B, achieving up to 2.8x performance gain with less than 10\% tokens on tasks requiring heavy information exchange. Moreover, Optima's efficiency gains open new possibilities for leveraging inference-compute more effectively, leading to improved inference-time scaling laws. By addressing fundamental challenges in LLM-based MAS, Optima shows the potential towards scalable, efficient, and effective MAS (https://chenweize1998.github.io/optima-project-page).

優化：針對基於LLM的多智能體系統的效能和效率進行優化

Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System

摘要

Support