DocDancer:迈向基于文档的自主信息检索智能体
DocDancer: Towards Agentic Document-Grounded Information Seeking
January 8, 2026
作者: Qintong Zhang, Xinjie Lv, Jialong Wu, Baixuan Li, Zhengwei Tao, Guochen Yan, Huanyao Zhang, Bin Wang, Jiahao Xu, Haitao Mi, Wentao Zhang
cs.AI
摘要
文档问答(DocQA)致力于基于给定文档回答相关问题,然而现有DocQA智能体缺乏有效的工具调用能力,且主要依赖闭源模型。本研究提出DocDancer——一种端到端训练的开源文档智能体。我们将DocQA构建为信息检索问题,并提出一种工具驱动的智能体框架,显式建模文档探索与理解过程。为实现此类智能体的端到端训练,我们设计了"探索-合成"数据生成流程,以解决DocQA高质量训练数据匮乏的问题。在合成数据上训练的模型,于MMLongBench-Doc和DocBench两个长文本理解基准测试中均展现出色性能。进一步分析为工具化智能体设计及合成数据提供了有价值的洞见。
English
Document Question Answering (DocQA) focuses on answering questions grounded in given documents, yet existing DocQA agents lack effective tool utilization and largely rely on closed-source models. In this work, we introduce DocDancer, an end-to-end trained open-source Doc agent. We formulate DocQA as an information-seeking problem and propose a tool-driven agent framework that explicitly models document exploration and comprehension. To enable end-to-end training of such agents, we introduce an Exploration-then-Synthesis data synthesis pipeline that addresses the scarcity of high-quality training data for DocQA. Training on the synthesized data, the trained models on two long-context document understanding benchmarks, MMLongBench-Doc and DocBench, show their effectiveness. Further analysis provides valuable insights for the agentic tool design and synthetic data.