ChatPaper.aiChatPaper

NER檢索器:基於類型感知嵌入的零樣本命名實體檢索

NER Retriever: Zero-Shot Named Entity Retrieval with Type-Aware Embeddings

September 4, 2025
作者: Or Shachar, Uri Katz, Yoav Goldberg, Oren Glickman
cs.AI

摘要

我們提出了NER檢索器,這是一個針對即席命名實體檢索的零樣本檢索框架,該任務是命名實體識別(NER)的一種變體,其中感興趣的類型並未預先提供,而是使用用戶定義的類型描述來檢索提及該類型實體的文檔。我們的方法不依賴於固定模式或微調模型,而是基於大型語言模型(LLMs)的內部表示,將實體提及和用戶提供的開放式類型描述嵌入到共享的語義空間中。我們發現,內部表示,特別是來自中間層變換器塊的值向量,比常用的頂層嵌入更有效地編碼細粒度類型信息。為了精煉這些表示,我們訓練了一個輕量級的對比投影網絡,該網絡在對齊類型兼容實體的同時分離不相關的類型。生成的實體嵌入緊湊、類型感知,並且非常適合最近鄰搜索。在三個基準測試中,NER檢索器顯著優於詞彙級和密集句子級檢索基線。我們的研究結果為LLMs內部的表示選擇提供了實證支持,並展示了一種可擴展、無模式的實體檢索的實用解決方案。NER檢索器代碼庫已公開於https://github.com/ShacharOr100/ner_retriever。
English
We present NER Retriever, a zero-shot retrieval framework for ad-hoc Named Entity Retrieval, a variant of Named Entity Recognition (NER), where the types of interest are not provided in advance, and a user-defined type description is used to retrieve documents mentioning entities of that type. Instead of relying on fixed schemas or fine-tuned models, our method builds on internal representations of large language models (LLMs) to embed both entity mentions and user-provided open-ended type descriptions into a shared semantic space. We show that internal representations, specifically the value vectors from mid-layer transformer blocks, encode fine-grained type information more effectively than commonly used top-layer embeddings. To refine these representations, we train a lightweight contrastive projection network that aligns type-compatible entities while separating unrelated types. The resulting entity embeddings are compact, type-aware, and well-suited for nearest-neighbor search. Evaluated on three benchmarks, NER Retriever significantly outperforms both lexical and dense sentence-level retrieval baselines. Our findings provide empirical support for representation selection within LLMs and demonstrate a practical solution for scalable, schema-free entity retrieval. The NER Retriever Codebase is publicly available at https://github.com/ShacharOr100/ner_retriever
PDF101September 5, 2025