PaperRegister:通過分層註冊索引提升細粒度論文搜索的靈活性
PaperRegister: Boosting Flexible-grained Paper Search via Hierarchical Register Indexing
August 14, 2025
作者: Zhuoqun Li, Xuanang Chen, Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun
cs.AI
摘要
論文檢索是研究人員的一項重要活動,通常涉及使用描述主題的查詢來尋找相關論文。隨著研究的深入,論文檢索的需求可能變得更加靈活,有時會涉及特定細節,如模組配置,而不僅僅限於粗粒度的主題。然而,現有的論文檢索系統無法滿足這些靈活粒度的需求,因為這些系統主要收集論文摘要來構建語料庫索引,缺乏詳細資訊以支持更細粒度的查詢檢索。在本研究中,我們提出了PaperRegister,它由離線分層索引和線上自適應檢索組成,將傳統基於摘要的索引轉化為分層索引樹,從而支持靈活粒度的查詢。在一系列粒度範圍的論文檢索任務上的實驗表明,PaperRegister達到了最先進的性能,特別是在細粒度場景中表現出色,突顯了其作為現實應用中靈活粒度論文檢索有效解決方案的巨大潛力。本研究的程式碼位於https://github.com/Li-Z-Q/PaperRegister。
English
Paper search is an important activity for researchers, typically involving
using a query with description of a topic to find relevant papers. As research
deepens, paper search requirements may become more flexible, sometimes
involving specific details such as module configuration rather than being
limited to coarse-grained topics. However, previous paper search systems are
unable to meet these flexible-grained requirements, as these systems mainly
collect paper abstracts to construct index of corpus, which lack detailed
information to support retrieval by finer-grained queries. In this work, we
propose PaperRegister, consisted of offline hierarchical indexing and online
adaptive retrieval, transforming traditional abstract-based index into
hierarchical index tree for paper search, thereby supporting queries at
flexible granularity. Experiments on paper search tasks across a range of
granularity demonstrate that PaperRegister achieves the state-of-the-art
performance, and particularly excels in fine-grained scenarios, highlighting
the good potential as an effective solution for flexible-grained paper search
in real-world applications. Code for this work is in
https://github.com/Li-Z-Q/PaperRegister.