Bron of het is niet gebeurd: een multi-agent-framework voor detectie van citaathallucinaties

Samenvatting

Grote taalmodellen worden steeds vaker gebruikt in wetenschappelijk schrijven, maar ze kunnen citatievormige referenties verzinnen die plausibel lijken, maar niet door bibliografische verificatie komen. Bestaande detectoren beperken verificatie vaak tot binaire gevonden/niet-gevonden beslissingen en vertrouwen op fragiele parsing of onvolledige ophaling, waardoor auditors weinig signaal op veldniveau krijgen. Wij herkaderen detectie van citatiehallucinaties als taxonomie-afgestemde adjudicatie op veldniveau en introduceren een 12-codige taxonomie die Echte, Potentiële en Gehallucineerde citaten omvat. Op basis van deze taxonomie bouwen we CiteTracer, een trapsgewijze multi-agent detector die gestructureerde citaten extraheert uit PDF en BibTeX, bewijs ophaalt via cache-opzoekingen, URL-ophaling, scholar-connectors en webzoekopdrachten, deterministische veldmatching toepast en dubieuze gevallen doorstuurt naar klasse-specialistische beoordelaars. We brengen een benchmark uit van 2.450 synthetische citaten gebouwd uit echte seed-citaten met gecontroleerde LLM-mutaties, gepaard met 957 in de echte wereld verzonnen citaten afkomstig uit ICLR 2026 en anonieme conferentie-inzendingen die redactioneel zijn afgewezen. CiteTracer behaalt 97,1% nauwkeurigheid op de synthetische benchmark, met F1-scores op klasseniveau van 97,0, 95,8 en 98,5 voor respectievelijk Echt, Potentieel en Gehallucineerd, en detecteert 97,1% van de verzinsels in de echte-wereldset zonder zich te onthouden. Code: https://github.com/aaFrostnova/CiteTracer.

English

Large language models are increasingly used in scientific writing, yet they can fabricate citation-shaped references that appear plausible but fail bibliographic verification. Existing detectors often reduce verification to binary found/not-found decisions and rely on brittle parsing or incomplete retrieval, offering little field-level signal to auditors. We reframe citation hallucination detection as taxonomy-aligned field-level adjudication and introduce a 12-code taxonomy spanning Real, Potential, and Hallucinated citations. Based on this taxonomy, we build CiteTracer, a cascading multi-agent detector that extracts structured citations from PDF and BibTeX, retrieves evidence through cache lookup, URL fetch, scholar connectors, and web search, applies deterministic field matching, and routes ambiguous cases to class-specialist judgers. We release a benchmark of 2,450 synthetic citations built from real seeds with controlled LLM mutations, paired with 957 real-world fabricated citations drawn from ICLR 2026 and an anonymous conference desk-rejected submissions. CiteTracer reaches 97.1% accuracy on the synthetic benchmark, with class-level F1 scores of 97.0, 95.8, and 98.5 for Real, Potential, and Hallucinated, respectively, and detects 97.1% of fabrications on the real-world set without abstaining. Code: https://github.com/aaFrostnova/CiteTracer.