DLLM-Searcher：検索エージェント向け拡散大規模言語モデルの適応

要旨

近年、拡散型大規模言語モデル（dLLM）は、その本質的に並列的なデコード機構と柔軟な生成パラダイムにより、独特な効率性の優位性を示している。一方、検索エージェントの急速な進展にもかかわらず、その実用的な展開は、以下の根本的な制約によって妨げられている。1) **レイテンシ課題**: ReActエージェントパラダイム下での、マルチラウンド推論、ツール呼び出し、ツール応答待機の逐次的な実行が、深刻なエンドツーエンドのレイテンシを引き起こす。直感的には、dLLMはその特徴的な強みを活かして、ReActエージェントパラダイム下でのエージェントの動作効率を最適化できる可能性がある。しかし実際には、既存のdLLM基盤モデルは2) **エージェント能力課題**に直面している。つまり、既存のdLLMは著しく弱い推論およびツール呼び出し能力を示し、これらの利点が実践で効果的に発揮されるのを妨げている。本論文では、dLLMベースの検索エージェントの最適化フレームワークであるDLLM-Searcherを提案する。エージェント能力課題を解決するため、エージェント機能を強化する教師ありファインチューニング（Agentic SFT）とエージェント分散低減偏好最適化（Agentic VRPO）を含む2段階の事後学習パイプラインを設計し、基盤dLLMの情報探索および推論能力を向上させる。レイテンシ課題を軽減するため、dLLMの柔軟な生成機構を活用し、**並列推論・実行（P-ReAct）** と呼ばれる新しいエージェントパラダイムを提案する。P-ReActは、モデルがツール呼び出し命令のデコードを優先するように導き、ツールの戻りを待ちながらモデルが思考を継続できるようにする。実験結果は、DLLM-Searcherが主流のLLMベース検索エージェントに匹敵する性能を達成し、P-ReActが約15%の推論加速をもたらすことを示している。コードはhttps://anonymous.4open.science/r/DLLM-Searcher-553C で公開されている。

English

Recently, Diffusion Large Language Models (dLLMs) have demonstrated unique efficiency advantages, enabled by their inherently parallel decoding mechanism and flexible generation paradigm. Meanwhile, despite the rapid advancement of Search Agents, their practical deployment is constrained by a fundamental limitation, termed as 1) Latency Challenge: the serial execution of multi-round reasoning, tool calling, and tool response waiting under the ReAct agent paradigm induces severe end-to-end latency. Intuitively, dLLMs can leverage their distinctive strengths to optimize the operational efficiency of agents under the ReAct agent paradigm. Practically, existing dLLM backbones face the 2) Agent Ability Challenge. That is, existing dLLMs exhibit remarkably weak reasoning and tool-calling capabilities, preventing these advantages from being effectively realized in practice. In this paper, we propose DLLM-Searcher, an optimization framework for dLLM-based Search Agents. To solve the Agent Ability Challenge, we design a two-stage post-training pipeline encompassing Agentic Supervised Fine-Tuning (Agentic SFT) and Agentic Variance-Reduced Preference Optimization Agentic VRPO, which enhances the backbone dLLM's information seeking and reasoning capabilities. To mitigate the Latency Challenge, we leverage the flexible generation mechanism of dLLMs and propose a novel agent paradigm termed Parallel-Reasoning and Acting P-ReAct. P-ReAct guides the model to prioritize decoding tool_call instructions, thereby allowing the model to keep thinking while waiting for the tool's return. Experimental results demonstrate that DLLM-Searcher achieves performance comparable to mainstream LLM-based search agents and P-ReAct delivers approximately 15% inference acceleration. Our code is available at https://anonymous.4open.science/r/DLLM-Searcher-553C

DLLM-Searcher：検索エージェント向け拡散大規模言語モデルの適応

DLLM-Searcher: Adapting Diffusion Large Language Model for Search Agents

要旨

Support