ChatPaper.aiChatPaper

ExStrucTiny:面向文档图像中模式可变结构化信息提取的基准数据集

ExStrucTiny: A Benchmark for Schema-Variable Structured Information Extraction from Document Images

February 12, 2026
作者: Mathieu Sibue, Andres Muñoz Garza, Samuel Mensah, Pranav Shetty, Zhiqiang Ma, Xiaomo Liu, Manuela Veloso
cs.AI

摘要

企业文档(如表格和报告)中嵌入了对数据归档、自动化工作流和分析等下游应用至关重要的信息。尽管通用视觉语言模型在现有文档理解基准测试中表现良好,但其在不同文档类型和灵活模式间进行整体化、细粒度结构化信息抽取的能力尚未得到充分研究。现有的关键实体抽取、关系抽取和视觉问答数据集受限于狭窄的实体本体、简单查询或单一文档类型,往往忽略了可适配结构化抽取的需求。针对这些不足,我们推出了ExStrucTiny——一个面向文档图像结构化信息抽取的新基准数据集,它统一了关键实体抽取、关系抽取和视觉问答的多维特性。通过结合人工标注与合成样本并经过人工验证的新型构建流程,该数据集覆盖了更丰富的文档类型和抽取场景。我们在此基准上对开放和封闭式视觉语言模型进行了测试,揭示了模式适配、查询欠规范及答案定位等挑战。本研究有望为提升通用文档结构化信息抽取模型性能奠定基础。
English
Enterprise documents, such as forms and reports, embed critical information for downstream applications like data archiving, automated workflows, and analytics. Although generalist Vision Language Models (VLMs) perform well on established document understanding benchmarks, their ability to conduct holistic, fine-grained structured extraction across diverse document types and flexible schemas is not well studied. Existing Key Entity Extraction (KEE), Relation Extraction (RE), and Visual Question Answering (VQA) datasets are limited by narrow entity ontologies, simple queries, or homogeneous document types, often overlooking the need for adaptable and structured extraction. To address these gaps, we introduce ExStrucTiny, a new benchmark dataset for structured Information Extraction (IE) from document images, unifying aspects of KEE, RE, and VQA. Built through a novel pipeline combining manual and synthetic human-validated samples, ExStrucTiny covers more varied document types and extraction scenarios. We analyze open and closed VLMs on this benchmark, highlighting challenges such as schema adaptation, query under-specification, and answer localization. We hope our work provides a bedrock for improving generalist models for structured IE in documents.
PDF31February 14, 2026