Fijnmazige Detectie van Contextgebaseerde Hallucinaties met behulp van LLM's

Samenvatting

Context-gebaseerde hallucinaties zijn gevallen waarin modeluitvoer informatie bevat die niet verifieerbaar is aan de hand van de brontekst. We onderzoeken de toepasbaarheid van LLM's voor het lokaliseren van dergelijke hallucinaties, als een praktischer alternatief voor bestaande complexe evaluatiepijplijnen. In afwezigheid van gevestigde benchmarks voor meta-evaluatie van hallucinatielokalisatie, construeren we er een die is toegesneden op LLM's, waarbij een uitdagende menselijke annotatie van meer dan 1.000 voorbeelden betrokken is. We vullen de benchmark aan met een op LLM's gebaseerd evaluatieprotocol en verifiëren de kwaliteit ervan in een menselijke evaluatie. Omdat bestaande representaties van hallucinaties de soorten fouten die kunnen worden uitgedrukt beperken, stellen we een nieuwe representatie voor op basis van vrije tekstuele beschrijvingen, die het volledige scala aan mogelijke fouten vastlegt. We voeren een uitgebreide studie uit, waarbij we vier grootschalige LLM's evalueren, wat de moeilijkheidsgraad van de benchmark benadrukt, aangezien het beste model slechts een F1-score van 0,67 behaalt. Door zorgvuldige analyse bieden we inzichten in optimale promptingstrategieën voor de taak en identificeren we de belangrijkste factoren die het uitdagend maken voor LLM's: (1) een neiging om ontbrekende details ten onrechte als inconsistent te markeren, ondanks instructies om alleen feiten in de uitvoer te controleren; en (2) moeilijkheden met uitvoer die feitelijk correcte informatie bevatten die afwezig is in de bron - en dus niet verifieerbaar - vanwege afstemming op de parametrische kennis van het model.

English

Context-grounded hallucinations are cases where model outputs contain information not verifiable against the source text. We study the applicability of LLMs for localizing such hallucinations, as a more practical alternative to existing complex evaluation pipelines. In the absence of established benchmarks for meta-evaluation of hallucinations localization, we construct one tailored to LLMs, involving a challenging human annotation of over 1,000 examples. We complement the benchmark with an LLM-based evaluation protocol, verifying its quality in a human evaluation. Since existing representations of hallucinations limit the types of errors that can be expressed, we propose a new representation based on free-form textual descriptions, capturing the full range of possible errors. We conduct a comprehensive study, evaluating four large-scale LLMs, which highlights the benchmark's difficulty, as the best model achieves an F1 score of only 0.67. Through careful analysis, we offer insights into optimal prompting strategies for the task and identify the main factors that make it challenging for LLMs: (1) a tendency to incorrectly flag missing details as inconsistent, despite being instructed to check only facts in the output; and (2) difficulty with outputs containing factually correct information absent from the source - and thus not verifiable - due to alignment with the model's parametric knowledge.

Fijnmazige Detectie van Contextgebaseerde Hallucinaties met behulp van LLM's

Fine-Grained Detection of Context-Grounded Hallucinations Using LLMs

Samenvatting

Support