论文搜索问答系统:基于强化学习与视觉推理的科研文献检索与推理方法研究
PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR
January 26, 2026
作者: James Burgess, Jan N. Hansen, Duo Peng, Yuhui Zhang, Alejandro Lozano, Min Woo Sun, Emma Lundberg, Serena Yeung-Levy
cs.AI
摘要
搜尋代理器是能夠進行推理並檢索知識庫(或網路)以回答問題的語言模型;近期方法僅通過帶有可驗證獎勵的強化學習來監督最終答案的準確性。現有的大多數可驗證獎勵強化學習搜尋代理器主要針對通用領域問答,這限制了其在科學、工程和醫學領域技術性人工智慧系統中的應用價值。本研究提出訓練代理器對科學論文進行搜索與推理——此舉既能檢驗技術性問答能力,又直接關聯真實科學家的需求,相關能力對未來人工智慧科學家系統至關重要。具體而言,我們發布了包含1600萬篇生物醫學論文摘要的搜尋語料庫,並構建了名為PaperSearchQA的挑戰性事實型問答數據集,該數據集包含6萬個可從語料庫中獲取答案的樣本及對應基準測試。我們在此環境中訓練的搜尋代理器表現優於非強化學習的檢索基線模型;同時通過進一步定量分析觀察到代理器呈現出規劃、推理和自我驗證等有趣行為。我們的語料庫、數據集和基準測試可與流行的可驗證獎勵強化學習訓練代碼庫Search-R1兼容,並已發布於https://huggingface.co/collections/jmhb/papersearchqa。最後,我們的數據創建方法具備可擴展性,能輕鬆適配其他科學領域。
English
Search agents are language models (LMs) that reason and search knowledge bases (or the web) to answer questions; recent methods supervise only the final answer accuracy using reinforcement learning with verifiable rewards (RLVR). Most RLVR search agents tackle general-domain QA, which limits their relevance to technical AI systems in science, engineering, and medicine. In this work we propose training agents to search and reason over scientific papers -- this tests technical question-answering, it is directly relevant to real scientists, and the capabilities will be crucial to future AI Scientist systems. Concretely, we release a search corpus of 16 million biomedical paper abstracts and construct a challenging factoid QA dataset called PaperSearchQA with 60k samples answerable from the corpus, along with benchmarks. We train search agents in this environment to outperform non-RL retrieval baselines; we also perform further quantitative analysis and observe interesting agent behaviors like planning, reasoning, and self-verification. Our corpus, datasets, and benchmarks are usable with the popular Search-R1 codebase for RLVR training and released on https://huggingface.co/collections/jmhb/papersearchqa. Finally, our data creation methods are scalable and easily extendable to other scientific domains.