LMDX: 언어 모델 기반 문서 정보 추출 및 위치 특정화

초록

대형 언어 모델(LLM)은 자연어 처리(NLP) 분야에 혁신을 가져와 기존의 많은 과제에서 최첨단 성능을 개선하고 새로운 능력을 보여주고 있다. 그러나 LLM은 아직 반구조화된 문서 정보 추출 작업에 성공적으로 적용되지 못하고 있다. 이 작업은 많은 문서 처리 워크플로우의 핵심을 이루며, 시각적으로 풍부한 문서(VRD)에서 미리 정의된 대상 스키마에 따라 주요 개체를 추출하는 것을 포함한다. 이 작업에서 LLM 도입의 주요 장애물은 고품질 추출에 필수적인 레이아웃 인코딩의 부재와 답변이 허구화되지 않도록 보장하는 근거 메커니즘의 결여였다. 본 논문에서는 임의의 LLM을 문서 정보 추출에 적응시키기 위한 방법론인 언어 모델 기반 문서 정보 추출 및 위치 지정(LMDX)을 소개한다. LMDX는 단일, 반복, 계층적 개체를 학습 데이터 유무에 관계없이 추출할 수 있으며, 근거를 보장하고 문서 내에서 개체의 위치를 지정할 수 있다. 특히, 우리는 LMDX를 PaLM 2-S LLM에 적용하고 VRDU 및 CORD 벤치마크에서 평가하여 새로운 최첨단 성능을 달성하고, LMDX가 고품질의 데이터 효율적인 파서 생성에 어떻게 기여하는지 보여준다.

English

Large Language Models (LLM) have revolutionized Natural Language Processing (NLP), improving state-of-the-art on many existing tasks and exhibiting emergent capabilities. However, LLMs have not yet been successfully applied on semi-structured document information extraction, which is at the core of many document processing workflows and consists of extracting key entities from a visually rich document (VRD) given a predefined target schema. The main obstacles to LLM adoption in that task have been the absence of layout encoding within LLMs, critical for a high quality extraction, and the lack of a grounding mechanism ensuring the answer is not hallucinated. In this paper, we introduce Language Model-based Document Information Extraction and Localization (LMDX), a methodology to adapt arbitrary LLMs for document information extraction. LMDX can do extraction of singular, repeated, and hierarchical entities, both with and without training data, while providing grounding guarantees and localizing the entities within the document. In particular, we apply LMDX to the PaLM 2-S LLM and evaluate it on VRDU and CORD benchmarks, setting a new state-of-the-art and showing how LMDX enables the creation of high quality, data-efficient parsers.

LMDX: 언어 모델 기반 문서 정보 추출 및 위치 특정화

LMDX: Language Model-based Document Information Extraction and Localization

초록

Support