FinTagging: Een benchmark voor grote taalmodellen voor het extraheren en structureren van financiële informatie

Samenvatting

We introduceren FinTagging, de eerste volledige, tabelbewuste XBRL-benchmark die is ontworpen om de gestructureerde informatie-extractie en semantische uitlijningscapaciteiten van grote taalmodellen (LLMs) te evalueren in de context van XBRL-gebaseerde financiële rapportage. In tegenstelling tot eerdere benchmarks die XBRL-tagging oversimplificeren als platte multiclass-classificatie en zich uitsluitend richten op narratieve tekst, ontleedt FinTagging het XBRL-taggingprobleem in twee subtaken: FinNI voor financiële entiteitsextractie en FinCL voor taxonomiegestuurde conceptuitlijning. Het vereist dat modellen feiten gezamenlijk extraheren en uitlijnen met de volledige 10k+ US-GAAP-taxonomie, zowel in ongestructureerde tekst als gestructureerde tabellen, waardoor een realistische, fijnmazige evaluatie mogelijk wordt. We beoordelen een diverse set LLMs onder zero-shot-instellingen, waarbij we hun prestaties systematisch analyseren op beide subtaken en de algehele taggingnauwkeurigheid. Onze resultaten laten zien dat, hoewel LLMs sterke generalisatie vertonen in informatie-extractie, ze moeite hebben met fijnmazige conceptuitlijning, met name bij het onderscheiden van nauw verwante taxonomievermeldingen. Deze bevindingen benadrukken de beperkingen van bestaande LLMs in het volledig automatiseren van XBRL-tagging en onderstrepen de noodzaak van verbeterde semantische redenering en schema-bewuste modellering om te voldoen aan de eisen van nauwkeurige financiële openbaarmaking. Code is beschikbaar in onze GitHub-repository en data is te vinden in onze Hugging Face-repository.

English

We introduce FinTagging, the first full-scope, table-aware XBRL benchmark designed to evaluate the structured information extraction and semantic alignment capabilities of large language models (LLMs) in the context of XBRL-based financial reporting. Unlike prior benchmarks that oversimplify XBRL tagging as flat multi-class classification and focus solely on narrative text, FinTagging decomposes the XBRL tagging problem into two subtasks: FinNI for financial entity extraction and FinCL for taxonomy-driven concept alignment. It requires models to jointly extract facts and align them with the full 10k+ US-GAAP taxonomy across both unstructured text and structured tables, enabling realistic, fine-grained evaluation. We assess a diverse set of LLMs under zero-shot settings, systematically analyzing their performance on both subtasks and overall tagging accuracy. Our results reveal that, while LLMs demonstrate strong generalization in information extraction, they struggle with fine-grained concept alignment, particularly in disambiguating closely related taxonomy entries. These findings highlight the limitations of existing LLMs in fully automating XBRL tagging and underscore the need for improved semantic reasoning and schema-aware modeling to meet the demands of accurate financial disclosure. Code is available at our GitHub repository and data is at our Hugging Face repository.

FinTagging: Een benchmark voor grote taalmodellen voor het extraheren en structureren van financiële informatie

FinTagging: An LLM-ready Benchmark for Extracting and Structuring Financial Information

Samenvatting

Support