PaSa：一個用於全面學術論文檢索的LLM代理程序

摘要

我們介紹了 PaSa，一個由大型語言模型驅動的先進論文檢索代理。PaSa能夠自主做出一系列決策，包括調用檢索工具、閱讀論文和選擇相關參考文獻，從而最終為複雜的學術查詢獲得全面且準確的結果。我們使用強化學習和一個合成數據集 AutoScholarQuery 來優化 PaSa，該數據集包括從頂級人工智能會議出版物中獲取的 35k 細緻的學術查詢和相應論文。此外，我們開發了 RealScholarQuery，一個收集真實世界學術查詢以評估 PaSa 在更現實情境下的表現的基準。儘管在合成數據上訓練，PaSa 在 RealScholarQuery 上明顯優於現有基準，包括 Google、Google Scholar、使用 GPT-4 進行釋義查詢的 Google、chatGPT（支持搜索的 GPT-4o）、GPT-o1 和 PaSa-GPT-4o（通過提示 GPT-4o 實現的 PaSa）。值得注意的是，PaSa-7B 在 recall@20 和 recall@50 上分別比最佳 Google 基準 Google with GPT-4o 高出 37.78% 和 39.90%。它還在召回率上超過 PaSa-GPT-4o 30.36%，在精確率上超過 4.25%。模型、數據集和代碼可在 https://github.com/bytedance/pasa 找到。

English

We introduce PaSa, an advanced Paper Search agent powered by large language models. PaSa can autonomously make a series of decisions, including invoking search tools, reading papers, and selecting relevant references, to ultimately obtain comprehensive and accurate results for complex scholarly queries. We optimize PaSa using reinforcement learning with a synthetic dataset, AutoScholarQuery, which includes 35k fine-grained academic queries and corresponding papers sourced from top-tier AI conference publications. Additionally, we develop RealScholarQuery, a benchmark collecting real-world academic queries to assess PaSa performance in more realistic scenarios. Despite being trained on synthetic data, PaSa significantly outperforms existing baselines on RealScholarQuery, including Google, Google Scholar, Google with GPT-4 for paraphrased queries, chatGPT (search-enabled GPT-4o), GPT-o1, and PaSa-GPT-4o (PaSa implemented by prompting GPT-4o). Notably, PaSa-7B surpasses the best Google-based baseline, Google with GPT-4o, by 37.78% in recall@20 and 39.90% in recall@50. It also exceeds PaSa-GPT-4o by 30.36% in recall and 4.25% in precision. Model, datasets, and code are available at https://github.com/bytedance/pasa.

PaSa：一個用於全面學術論文檢索的LLM代理程序

PaSa: An LLM Agent for Comprehensive Academic Paper Search

摘要

Support