ScoreFlow:通過基於分數的偏好優化掌握LLM代理工作流程
ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization
February 6, 2025
作者: Yinjie Wang, Ling Yang, Guohao Li, Mengdi Wang, Bryon Aragam
cs.AI
摘要
最近的研究已經利用大型語言模型多智能體系統來進行複雜問題的解決,同時試圖減少構建這些系統所需的手動工作量,推動了自動智能體工作流程優化方法的發展。然而,現有方法由於表徵限制、缺乏適應性以及依賴離散優化技術時的擴展性不佳,仍然缺乏靈活性。我們通過ScoreFlow解決了這些挑戰,這是一個簡單但高性能的框架,利用在連續空間中的高效梯度優化。ScoreFlow整合了Score-DPO,這是直接偏好優化方法的一個新變體,考慮了定量反饋。在涵蓋問答、編碼和數學推理的六個基準測試中,ScoreFlow相較於現有基準線提高了8.2%。此外,它使較小的模型能夠以更低的推論成本勝過較大的模型。專案連結:https://github.com/Gen-Verse/ScoreFlow
English
Recent research has leveraged large language model multi-agent systems for
complex problem-solving while trying to reduce the manual effort required to
build them, driving the development of automated agent workflow optimization
methods. However, existing methods remain inflexible due to representational
limitations, a lack of adaptability, and poor scalability when relying on
discrete optimization techniques. We address these challenges with ScoreFlow, a
simple yet high-performance framework that leverages efficient gradient-based
optimization in a continuous space. ScoreFlow incorporates Score-DPO, a novel
variant of the direct preference optimization method that accounts for
quantitative feedback. Across six benchmarks spanning question answering,
coding, and mathematical reasoning, ScoreFlow achieves an 8.2% improvement over
existing baselines. Moreover, it empowers smaller models to outperform larger
ones with lower inference costs. Project:
https://github.com/Gen-Verse/ScoreFlowSummary
AI-Generated Summary