LMDX：基於語言模型的文件信息提取和定位

摘要

大型語言模型（LLM）已經在自然語言處理（NLP）領域引起了革命，提升了許多現有任務的最新技術水平並展現出新興的能力。然而，LLM 尚未成功應用於半結構文件信息提取，這是許多文件處理工作流程的核心，包括從視覺豐富文件（VRD）中提取關鍵實體，並給定預定義的目標架構。LLM 在該任務中應用的主要障礙是缺乏在LLM 內部進行版面編碼，這對於高質量提取至關重要，以及缺乏確保答案不是虛構的基礎機制。在本文中，我們介紹基於語言模型的文件信息提取和定位（LMDX）方法，用於適應任意LLM 進行文件信息提取。LMDX 可以進行單個、重複和階層實體的提取，無論是否有訓練數據，同時提供基礎保證並定位文件中的實體。特別是，我們將 LMDX 應用於 PaLM 2-S LLM，並在 VRDU 和 CORD 基準上進行評估，創立了新的技術水平，展示了 LMDX 如何實現高質量、高效的解析器的創建。

English

Large Language Models (LLM) have revolutionized Natural Language Processing (NLP), improving state-of-the-art on many existing tasks and exhibiting emergent capabilities. However, LLMs have not yet been successfully applied on semi-structured document information extraction, which is at the core of many document processing workflows and consists of extracting key entities from a visually rich document (VRD) given a predefined target schema. The main obstacles to LLM adoption in that task have been the absence of layout encoding within LLMs, critical for a high quality extraction, and the lack of a grounding mechanism ensuring the answer is not hallucinated. In this paper, we introduce Language Model-based Document Information Extraction and Localization (LMDX), a methodology to adapt arbitrary LLMs for document information extraction. LMDX can do extraction of singular, repeated, and hierarchical entities, both with and without training data, while providing grounding guarantees and localizing the entities within the document. In particular, we apply LMDX to the PaLM 2-S LLM and evaluate it on VRDU and CORD benchmarks, setting a new state-of-the-art and showing how LMDX enables the creation of high quality, data-efficient parsers.

LMDX：基於語言模型的文件信息提取和定位

LMDX: Language Model-based Document Information Extraction and Localization

摘要

Support