优化：基于LLM的多智体系统的效率和效能优化

摘要

基于大型语言模型（LLM）的多智能体系统（MAS）在协作解决问题方面展现出显著潜力，但仍面临着关键挑战：通信效率低、可扩展性差，以及缺乏有效的参数更新优化方法。我们提出了Optima，这是一个通过LLM训练显著增强LLM-based MAS中通信效率和任务效果的新框架。Optima采用迭代生成、排名、选择和训练范式，通过平衡任务性能、标记效率和通信可读性的奖励函数，解决了这些问题。我们探讨了各种强化学习算法，包括监督微调、直接偏好优化以及它们的混合方法，提供了它们之间有效性和效率的权衡见解。我们将基于蒙特卡洛树搜索启发的技术整合到DPO数据生成中，将对话轮视为树节点以探索不同的交互路径。在包括信息不对称问答和复杂推理在内的常见多智能体任务上进行评估，Optima相对于基于Llama 3 8B的单智能体基线和普通MAS表现出持续和显著的改进，实现了在需要大量信息交换的任务上高达2.8倍的性能提升，且标记数量不到10%。此外，Optima的效率提升为更有效地利用推理计算提供了新的可能性，从而导致了改进的推理时间缩放规律。通过解决LLM-based MAS中的基本挑战，Optima展示了朝着可扩展、高效和有效的MAS潜力。(https://chenweize1998.github.io/optima-project-page)

English

Large Language Model (LLM) based multi-agent systems (MAS) show remarkable potential in collaborative problem-solving, yet they still face critical challenges: low communication efficiency, poor scalability, and a lack of effective parameter-updating optimization methods. We present Optima, a novel framework that addresses these issues by significantly enhancing both communication efficiency and task effectiveness in LLM-based MAS through LLM training. Optima employs an iterative generate, rank, select, and train paradigm with a reward function balancing task performance, token efficiency, and communication readability. We explore various RL algorithms, including Supervised Fine-Tuning, Direct Preference Optimization, and their hybrid approaches, providing insights into their effectiveness-efficiency trade-offs. We integrate Monte Carlo Tree Search-inspired techniques for DPO data generation, treating conversation turns as tree nodes to explore diverse interaction paths. Evaluated on common multi-agent tasks, including information-asymmetric question answering and complex reasoning, Optima shows consistent and substantial improvements over single-agent baselines and vanilla MAS based on Llama 3 8B, achieving up to 2.8x performance gain with less than 10\% tokens on tasks requiring heavy information exchange. Moreover, Optima's efficiency gains open new possibilities for leveraging inference-compute more effectively, leading to improved inference-time scaling laws. By addressing fundamental challenges in LLM-based MAS, Optima shows the potential towards scalable, efficient, and effective MAS (https://chenweize1998.github.io/optima-project-page).

优化：基于LLM的多智体系统的效率和效能优化

Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System

摘要

Support