QLASS: Q-ガイド付き段階的探索による言語エージェント推論の強化

要旨

言語エージェントは複雑な対話タスクに対する有望な解決策となっています。言語エージェントの成功の鍵の一つは、エージェントのワークフローの軌跡上にある報酬モデルであり、トレーニングや推論中に貴重なガイダンスを提供します。しかし、中間の相互作用の注釈が不足しているため、既存の多くの研究では、全体の軌跡を横断してポリシーを最適化するための結果報酬モデルが使用されています。これは、サブ最適なポリシーを導き、全体的なパフォーマンスを妨げる可能性があります。この問題に対処するために、私たちはQLASS（Q-guided Language Agent Stepwise Search）を提案し、オープンな言語エージェント向けにQ値を段階的に推定することで自動的に注釈を生成します。推論プロセスの報酬モデリングを導入し、推論中のモデルの性能向上に有効な中間ガイダンスを提供します。段階的なガイダンスを通じて、Q-guided生成戦略を提案し、言語エージェントが長期的な価値に適応しやすくなり、複雑な対話エージェントタスクのモデル推論中の性能向上につながります。特筆すべきは、ほぼ半分の注釈付きデータでも、QLASSは強力なパフォーマンスを維持し、限られた監視を処理する効率性を示しています。また、質的分析を通じて、QLASSがより効果的な意思決定を導くことを経験的に示します。コードとデータを公開予定です。

English

Language agents have become a promising solution to complex interactive tasks. One of the key ingredients to the success of language agents is the reward model on the trajectory of the agentic workflow, which provides valuable guidance during training or inference. However, due to the lack of annotations of intermediate interactions, most existing works use an outcome reward model to optimize policies across entire trajectories. This may lead to sub-optimal policies and hinder the overall performance. To address this, we propose QLASS (Q-guided Language Agent Stepwise Search), to automatically generate annotations by estimating Q-values in a stepwise manner for open language agents. By introducing a reasoning tree and performing process reward modeling, QLASS provides effective intermediate guidance for each step. With the stepwise guidance, we propose a Q-guided generation strategy to enable language agents to better adapt to long-term value, resulting in significant performance improvement during model inference on complex interactive agent tasks. Notably, even with almost half the annotated data, QLASS retains strong performance, demonstrating its efficiency in handling limited supervision. We also empirically demonstrate that QLASS can lead to more effective decision making through qualitative analysis. We will release our code and data.

QLASS: Q-ガイド付き段階的探索による言語エージェント推論の強化

QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search

要旨

Support