EmpiriGraph-Psy: een dataset en LLM-pijplijn voor het extraheren van empirische relatiegrafen uit psychologiesamenvattingen

Samenvatting

Bestaande benchmarks voor wetenschappelijke relatie-extractie richten zich voornamelijk op domeinen zoals informatica, waar entiteiten taken, methoden, datasets, materialen of metrieken zijn. Dit laat een leemte in variabelegerichte empirische velden zoals psychologie, waar bevindingen worden uitgedrukt als relaties tussen constructen, metingen, interventies en uitkomsten. We introduceren variabelegerichte empirische grafiextractie, de taak om wetenschappelijke abstracts om te zetten in getypeerde grafen waarvan de knooppunten genormaliseerde variabelen zijn en waarvan de verbindingen empirische en hiërarchische relaties vertegenwoordigen. Om deze taak te ondersteunen, construeren we EmpiriGraph-Psy, een benchmark van 210 abstracts uit de psychologie, geannoteerd door domeingetrainde annotatoren met genormaliseerde variabelen, concepthiërarchieën, empirische relatetypen en validatiestaten. We evalueren geavanceerde en open-weight LLM's met zowel directe extractie als een gefaseerde grafiekconstructiepijplijn die variabelextractie, normalisatie, hiërarchieconstructie, evidence-selectie, relatie-extractie en kantvalidatie scheidt. De gefaseerde pijplijn presteert aanzienlijk beter dan directe extractie, waarbij de beste configuratie een macro-F1 van 0,74 behaalt. Foutenanalyse toont aan dat moderatierelaties en concepthiërarchieën de meest uitdagende gevallen blijven, wat de moeilijkheid benadrukt van het extraheren van empirische beweringen van hogere orde en impliciete abstractiestructuur uit wetenschappelijke abstracts.

English

Existing scientific relation extraction benchmarks mainly target domains such as computer science, where entities are tasks, methods, datasets, materials, or metrics. This leaves a gap in variable-oriented empirical fields such as psychology, where findings are expressed as relations among constructs, measurements, interventions, and outcomes. We introduce variable-centered empirical graph extraction, the task of mapping scientific abstracts to typed graphs whose nodes are normalized variables and whose edges represent empirical and hierarchical relations. To support this task, we construct EmpiriGraph-Psy, a benchmark of 210 psychology abstracts annotated by domain-trained annotators with normalized variables, concept hierarchies, empirical relation types, and validation states. We evaluate frontier and open-weight LLMs using both direct extraction and a staged graph-construction pipeline that separates variable extraction, normalization, hierarchy construction, evidence selection, relation extraction, and edge validation. The staged pipeline substantially outperforms direct extraction, with the best configuration achieving a macro-F1 of 0.74. Error analysis shows that moderation relations and concept hierarchies remain the most challenging cases, highlighting the difficulty of extracting higher-order empirical claims and implicit abstraction structure from scientific abstracts.