EmpiriGraph-Psy：从心理学摘要中提取经验关系图的数据集与LLM管线

摘要

现有科学关系抽取基准主要针对计算机科学等领域，其中实体类型涵盖任务、方法、数据集、材料或指标。这导致在以变量为导向的经验性学科（如心理学）中存在空白——该类学科的研究成果通常以构念、测量、干预和结果之间的关系形式呈现。为此，我们提出以变量为中心的实证图抽取任务，旨在将科学摘要映射为类型化图结构，其中节点为标准化变量，边表示经验关系与层次关系。为支撑该任务，我们构建了EmpiriGraph-Psy基准数据集，包含210篇心理学摘要，由领域训练标注者完成标准化变量、概念层次、经验关系类型及验证状态的标注。我们采用直接抽取方法与分阶段图构建流程（将变量抽取、标准化、层次构建、证据选择、关系抽取与边验证相分离）对前沿及开放权重的大语言模型进行评估。分阶段流程显著优于直接抽取，最优配置的宏F1值达到0.74。错误分析表明，调节关系与概念层次仍是最具挑战性的案例，凸显了从科学摘要中提取高阶经验主张与隐式抽象结构的难度。

English

Existing scientific relation extraction benchmarks mainly target domains such as computer science, where entities are tasks, methods, datasets, materials, or metrics. This leaves a gap in variable-oriented empirical fields such as psychology, where findings are expressed as relations among constructs, measurements, interventions, and outcomes. We introduce variable-centered empirical graph extraction, the task of mapping scientific abstracts to typed graphs whose nodes are normalized variables and whose edges represent empirical and hierarchical relations. To support this task, we construct EmpiriGraph-Psy, a benchmark of 210 psychology abstracts annotated by domain-trained annotators with normalized variables, concept hierarchies, empirical relation types, and validation states. We evaluate frontier and open-weight LLMs using both direct extraction and a staged graph-construction pipeline that separates variable extraction, normalization, hierarchy construction, evidence selection, relation extraction, and edge validation. The staged pipeline substantially outperforms direct extraction, with the best configuration achieving a macro-F1 of 0.74. Error analysis shows that moderation relations and concept hierarchies remain the most challenging cases, highlighting the difficulty of extracting higher-order empirical claims and implicit abstraction structure from scientific abstracts.