ChatPaper.aiChatPaper

论文搜索问答:基于RLVR的科学文献检索与推理学习

PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR

January 26, 2026
作者: James Burgess, Jan N. Hansen, Duo Peng, Yuhui Zhang, Alejandro Lozano, Min Woo Sun, Emma Lundberg, Serena Yeung-Levy
cs.AI

摘要

搜索代理是通过推理和检索知识库(或网络)来回答问题的大语言模型;当前方法主要采用带可验证奖励的强化学习(RLVR),仅对最终答案的准确性进行监督。现有RLVR搜索代理多处理通用领域问答,这限制了其在科学、工程和医学等AI技术系统中的适用性。本研究提出训练代理检索和推理科学论文的能力——既能检验技术性问答水平,又直接关联实际科研工作者的需求,这种能力对未来"AI科学家"系统至关重要。具体而言,我们发布了包含1600万篇生物医学论文摘要的检索语料库,并构建了具有6万个可解答样本的挑战性事实问答数据集PaperSearchQA及其基准测试。在该环境中训练的搜索代理表现优于非强化学习的检索基线;我们进一步开展定量分析,观察到代理表现出规划、推理和自我验证等有趣行为。我们的语料库、数据集和基准测试可与流行的RLVR训练代码库Search-R1兼容,并发布于https://huggingface.co/collections/jmhb/papersearchqa。最后,我们的数据创建方法具备可扩展性,能轻松适配其他科学领域。
English
Search agents are language models (LMs) that reason and search knowledge bases (or the web) to answer questions; recent methods supervise only the final answer accuracy using reinforcement learning with verifiable rewards (RLVR). Most RLVR search agents tackle general-domain QA, which limits their relevance to technical AI systems in science, engineering, and medicine. In this work we propose training agents to search and reason over scientific papers -- this tests technical question-answering, it is directly relevant to real scientists, and the capabilities will be crucial to future AI Scientist systems. Concretely, we release a search corpus of 16 million biomedical paper abstracts and construct a challenging factoid QA dataset called PaperSearchQA with 60k samples answerable from the corpus, along with benchmarks. We train search agents in this environment to outperform non-RL retrieval baselines; we also perform further quantitative analysis and observe interesting agent behaviors like planning, reasoning, and self-verification. Our corpus, datasets, and benchmarks are usable with the popular Search-R1 codebase for RLVR training and released on https://huggingface.co/collections/jmhb/papersearchqa. Finally, our data creation methods are scalable and easily extendable to other scientific domains.
PDF162February 6, 2026