ScoreFlow：通過基於分數的偏好優化掌握LLM代理工作流程

摘要

最近的研究已經利用大型語言模型多智能體系統來進行複雜問題的解決，同時試圖減少構建這些系統所需的手動工作量，推動了自動智能體工作流程優化方法的發展。然而，現有方法由於表徵限制、缺乏適應性以及依賴離散優化技術時的擴展性不佳，仍然缺乏靈活性。我們通過ScoreFlow解決了這些挑戰，這是一個簡單但高性能的框架，利用在連續空間中的高效梯度優化。ScoreFlow整合了Score-DPO，這是直接偏好優化方法的一個新變體，考慮了定量反饋。在涵蓋問答、編碼和數學推理的六個基準測試中，ScoreFlow相較於現有基準線提高了8.2%。此外，它使較小的模型能夠以更低的推論成本勝過較大的模型。專案連結：https://github.com/Gen-Verse/ScoreFlow

English

Recent research has leveraged large language model multi-agent systems for complex problem-solving while trying to reduce the manual effort required to build them, driving the development of automated agent workflow optimization methods. However, existing methods remain inflexible due to representational limitations, a lack of adaptability, and poor scalability when relying on discrete optimization techniques. We address these challenges with ScoreFlow, a simple yet high-performance framework that leverages efficient gradient-based optimization in a continuous space. ScoreFlow incorporates Score-DPO, a novel variant of the direct preference optimization method that accounts for quantitative feedback. Across six benchmarks spanning question answering, coding, and mathematical reasoning, ScoreFlow achieves an 8.2% improvement over existing baselines. Moreover, it empowers smaller models to outperform larger ones with lower inference costs. Project: https://github.com/Gen-Verse/ScoreFlow

ScoreFlow：通過基於分數的偏好優化掌握LLM代理工作流程

ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization

摘要

Support