ChatPaper.aiChatPaper

DLLM-搜索器:面向搜索智能体的扩散大语言模型适配方案

DLLM-Searcher: Adapting Diffusion Large Language Model for Search Agents

February 3, 2026
作者: Jiahao Zhao, Shaoxuan Xu, Zhongxiang Sun, Fengqi Zhu, Jingyang Ou, Yuling Shi, Chongxuan Li, Xiao Zhang, Jun Xu
cs.AI

摘要

近日,扩散大语言模型(dLLM)凭借其固有的并行解码机制和灵活的生成范式,展现出独特的效率优势。与此同时,尽管搜索智能体发展迅速,但其实际部署仍受限于两大核心挑战:1)延迟挑战:在ReAct智能体范式下,多轮推理、工具调用及工具响应等待的串行执行会导致严重的端到端延迟。理论上,dLLM可凭借其独特优势优化ReAct范式下智能体的运行效率。但实践中,现有dLLM骨干模型面临2)智能体能力挑战:即现有dLLM表现出明显薄弱的推理与工具调用能力,致使这些优势无法有效落地。本文提出DLLM-Searcher这一基于dLLM的搜索智能体优化框架。针对智能体能力挑战,我们设计包含智能体监督微调(Agentic SFT)与智能体方差缩减偏好优化(Agentic VRPO)的两阶段后训练流程,增强骨干dLLM的信息检索与推理能力。为缓解延迟挑战,我们利用dLLM的灵活生成机制,提出名为并行推理与执行(P-ReAct)的新型智能体范式。P-ReAct引导模型优先解码工具调用指令,使模型在等待工具返回时可持续进行思考。实验结果表明,DLLM-Searcher达到与主流基于LLM的搜索智能体相当的性能,且P-ReAct可实现约15%的推理加速。代码已开源:https://anonymous.4open.science/r/DLLM-Searcher-553C
English
Recently, Diffusion Large Language Models (dLLMs) have demonstrated unique efficiency advantages, enabled by their inherently parallel decoding mechanism and flexible generation paradigm. Meanwhile, despite the rapid advancement of Search Agents, their practical deployment is constrained by a fundamental limitation, termed as 1) Latency Challenge: the serial execution of multi-round reasoning, tool calling, and tool response waiting under the ReAct agent paradigm induces severe end-to-end latency. Intuitively, dLLMs can leverage their distinctive strengths to optimize the operational efficiency of agents under the ReAct agent paradigm. Practically, existing dLLM backbones face the 2) Agent Ability Challenge. That is, existing dLLMs exhibit remarkably weak reasoning and tool-calling capabilities, preventing these advantages from being effectively realized in practice. In this paper, we propose DLLM-Searcher, an optimization framework for dLLM-based Search Agents. To solve the Agent Ability Challenge, we design a two-stage post-training pipeline encompassing Agentic Supervised Fine-Tuning (Agentic SFT) and Agentic Variance-Reduced Preference Optimization Agentic VRPO, which enhances the backbone dLLM's information seeking and reasoning capabilities. To mitigate the Latency Challenge, we leverage the flexible generation mechanism of dLLMs and propose a novel agent paradigm termed Parallel-Reasoning and Acting P-ReAct. P-ReAct guides the model to prioritize decoding tool_call instructions, thereby allowing the model to keep thinking while waiting for the tool's return. Experimental results demonstrate that DLLM-Searcher achieves performance comparable to mainstream LLM-based search agents and P-ReAct delivers approximately 15% inference acceleration. Our code is available at https://anonymous.4open.science/r/DLLM-Searcher-553C
PDF251February 12, 2026