ScoreFlow: スコアベースの選好最適化を通じたLLMエージェントワークフローのマスタリング

要旨

最近の研究では、大規模言語モデルのマルチエージェントシステムを活用して複雑な問題解決に取り組んでおり、それらを構築するために必要な手作業を削減しようとしています。これにより、自動エージェントワークフローの最適化方法の開発が推進されています。ただし、既存の手法は、表現上の制約、適応性の欠如、離散最適化技術への依存時のスケーラビリティの低さなどから、柔軟性に欠けています。私たちは、これらの課題に対処するために、ScoreFlowというシンプルでありながら高性能なフレームワークを提供します。ScoreFlowは、連続空間で効率的な勾配ベースの最適化を活用しています。ScoreFlowには、量的フィードバックを考慮した直接選好最適化手法の新しいバリアントであるScore-DPOが組み込まれています。質問回答、コーディング、数理推論をカバーする6つのベンチマークを通じて、ScoreFlowは既存のベースラインに比べて8.2%の改善を達成しています。さらに、ScoreFlowは、推論コストを低く抑えながら、より小さなモデルがより大きなモデルを上回ることを可能にしています。プロジェクト: https://github.com/Gen-Verse/ScoreFlow

English

Recent research has leveraged large language model multi-agent systems for complex problem-solving while trying to reduce the manual effort required to build them, driving the development of automated agent workflow optimization methods. However, existing methods remain inflexible due to representational limitations, a lack of adaptability, and poor scalability when relying on discrete optimization techniques. We address these challenges with ScoreFlow, a simple yet high-performance framework that leverages efficient gradient-based optimization in a continuous space. ScoreFlow incorporates Score-DPO, a novel variant of the direct preference optimization method that accounts for quantitative feedback. Across six benchmarks spanning question answering, coding, and mathematical reasoning, ScoreFlow achieves an 8.2% improvement over existing baselines. Moreover, it empowers smaller models to outperform larger ones with lower inference costs. Project: https://github.com/Gen-Verse/ScoreFlow

ScoreFlow: スコアベースの選好最適化を通じたLLMエージェントワークフローのマスタリング

ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization

要旨

Support