ChatPaper.aiChatPaper

RaV-IDP:基于重构验证的精准智能文档处理框架

RaV-IDP: A Reconstruction-as-Validation Framework for Faithful Intelligent Document Processing

April 26, 2026
作者: Pritesh Jha
cs.AI

摘要

智能文档处理流程从文档中提取结构化实体(表格、图像和文本),供知识库、检索增强生成和分析系统等下游系统使用。现有流程的持久性局限在于,其提取输出缺乏验证是否真实反映源文档内容的内在机制。模型内部置信度衡量的是推理确定性而非与文档的对应关系,提取错误会悄无声息地传递至下游系统。 我们提出"重建即验证"框架(RaV-IDP),将重建作为核心架构组件引入文档处理流程。每个实体被提取后,专用重建器会将提取结果重新渲染成可与原始文档区域对比的形式,比较器则对重建结果与原始文档截取区域进行保真度评分。这种保真度评分是 grounded、无标签的质量信号。当保真度低于实体类型特定阈值时,将触发结构化GPT-4.1视觉备用方案并重复验证循环。我们采用引导约束原则:比较器始终以原始文档区域为基准,而非提取结果,从而避免验证陷入循环论证。 我们进一步提出分阶段评估框架,为每个流程组件匹配相应的基准测试。该代码流程已公开于https://github.com/pritesh-2711/RaV-IDP,可供实验和使用。
English
Intelligent document processing pipelines extract structured entities (tables, images, and text) from documents for use in downstream systems such as knowledge bases, retrieval-augmented generation, and analytics. A persistent limitation of existing pipelines is that extraction output is produced without any intrinsic mechanism to verify whether it faithfully represents the source. Model-internal confidence scores measure inference certainty, not correspondence to the document, and extraction errors pass silently into downstream consumers. We present Reconstruction as Validation (RaV-IDP), a document processing pipeline that introduces reconstruction as a first-class architectural component. After each entity is extracted, a dedicated reconstructor renders the extracted representation back into a form comparable to the original document region, and a comparator scores fidelity between the reconstruction and the unmodified source crop. This fidelity score is a grounded, label-free quality signal. When fidelity falls below a per-entity-type threshold, a structured GPT-4.1 vision fallback is triggered and the validation loop repeats. We enforce a bootstrap constraint: the comparator always anchors against the original document region, never against the extraction, preventing the validation from becoming circular. We further propose a per-stage evaluation framework pairing each pipeline component with an appropriate benchmark. The code pipeline is publicly available at https://github.com/pritesh-2711/RaV-IDP for experimentation and use.