ChatPaper.aiChatPaper

评估大型语言模型在知识图谱中的意外发现能力:以药物重定位为例

Assessing LLMs for Serendipity Discovery in Knowledge Graphs: A Case for Drug Repurposing

November 16, 2025
作者: Mengying Wang, Chenhui Ma, Ao Jiao, Tuo Liang, Pengjun Lu, Shrinidhi Hegde, Yu Yin, Evren Gurkan-Cavusoglu, Yinghui Wu
cs.AI

摘要

大型语言模型(LLMs)显著推动了知识图谱问答(KGQA)的发展,然而现有系统通常针对返回高相关性但可预测的答案进行优化。当前缺失但亟需的能力是利用LLMs提供令人惊喜的新颖("意外发现"式)答案。本文正式定义了意外发现感知的KGQA任务,并提出SerenQA框架来评估LLMs在科学KGQA任务中发掘意外洞见的能力。该框架包含基于相关性、新颖性和惊喜度的严谨意外发现度量指标,以及源自临床知识图谱(聚焦于药物重定位领域)的专家标注基准。此外,它采用结构化评估流程,涵盖知识检索、子图推理和意外发现探索三个子任务。实验表明,虽然前沿LLMs在检索任务上表现良好,但在识别真正具有惊喜度与价值的新发现方面仍存在不足,这凸显了未来改进的重要空间。我们整理的资源及扩展版本已发布于:https://cwru-db-group.github.io/serenQA。
English
Large Language Models (LLMs) have greatly advanced knowledge graph question answering (KGQA), yet existing systems are typically optimized for returning highly relevant but predictable answers. A missing yet desired capacity is to exploit LLMs to suggest surprise and novel ("serendipitious") answers. In this paper, we formally define the serendipity-aware KGQA task and propose the SerenQA framework to evaluate LLMs' ability to uncover unexpected insights in scientific KGQA tasks. SerenQA includes a rigorous serendipity metric based on relevance, novelty, and surprise, along with an expert-annotated benchmark derived from the Clinical Knowledge Graph, focused on drug repurposing. Additionally, it features a structured evaluation pipeline encompassing three subtasks: knowledge retrieval, subgraph reasoning, and serendipity exploration. Our experiments reveal that while state-of-the-art LLMs perform well on retrieval, they still struggle to identify genuinely surprising and valuable discoveries, underscoring a significant room for future improvements. Our curated resources and extended version are released at: https://cwru-db-group.github.io/serenQA.
PDF52December 1, 2025