PaperRegister:通过分层注册索引提升细粒度论文检索的灵活性
PaperRegister: Boosting Flexible-grained Paper Search via Hierarchical Register Indexing
August 14, 2025
作者: Zhuoqun Li, Xuanang Chen, Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun
cs.AI
摘要
论文检索是研究人员的一项重要活动,通常涉及使用描述主题的查询来查找相关论文。随着研究的深入,论文检索需求可能变得更加灵活,有时会涉及模块配置等具体细节,而不再局限于粗粒度的主题。然而,以往的论文检索系统无法满足这些灵活粒度的需求,因为这些系统主要收集论文摘要来构建语料库索引,缺乏支持细粒度查询检索的详细信息。在本工作中,我们提出了PaperRegister,它由离线分层索引和在线自适应检索组成,将传统的基于摘要的索引转化为分层索引树,从而支持灵活粒度的论文检索。在一系列粒度上的论文检索任务实验中,PaperRegister展现了最先进的性能,尤其在细粒度场景中表现突出,凸显了其作为实际应用中灵活粒度论文检索有效解决方案的良好潜力。本工作的代码可在https://github.com/Li-Z-Q/PaperRegister获取。
English
Paper search is an important activity for researchers, typically involving
using a query with description of a topic to find relevant papers. As research
deepens, paper search requirements may become more flexible, sometimes
involving specific details such as module configuration rather than being
limited to coarse-grained topics. However, previous paper search systems are
unable to meet these flexible-grained requirements, as these systems mainly
collect paper abstracts to construct index of corpus, which lack detailed
information to support retrieval by finer-grained queries. In this work, we
propose PaperRegister, consisted of offline hierarchical indexing and online
adaptive retrieval, transforming traditional abstract-based index into
hierarchical index tree for paper search, thereby supporting queries at
flexible granularity. Experiments on paper search tasks across a range of
granularity demonstrate that PaperRegister achieves the state-of-the-art
performance, and particularly excels in fine-grained scenarios, highlighting
the good potential as an effective solution for flexible-grained paper search
in real-world applications. Code for this work is in
https://github.com/Li-Z-Q/PaperRegister.