Éclair -- 通過整合閱讀順序提取文件內容和版面設計
Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents
February 6, 2025
作者: Ilia Karmanov, Amala Sanjay Deshmukh, Lukas Voegtle, Philipp Fischer, Kateryna Chumachenko, Timo Roman, Jarno Seppänen, Jupinder Parmar, Joseph Jennings, Andrew Tao, Karan Sapra
cs.AI
摘要
光學字符識別(OCR)技術被廣泛應用於從文件圖像中提取文本,促進高效的數字化和數據檢索。然而,僅僅提取文本在處理複雜文件時是不夠的。要充分理解這些文件,需要理解它們的結構,包括格式、公式、表格,以及跨多個頁面的多個區塊和列的閱讀順序,還需要語義信息來檢測諸如註腳和圖片標題等元素。這種全面的理解對於後續任務至關重要,例如檢索、文件問答以及為訓練大型語言模型(LLMs)和視覺語言模型(VLMs)進行數據整理。為了應對這一挑戰,我們介紹了「Éclair」,這是一個通用的文本提取工具,專門設計用於處理各種類型的文件。給定一個圖像,「Éclair」能夠按閱讀順序提取格式化文本,並提供邊界框及其對應的語義類別。為了全面評估這些新功能,我們引入了我們多樣化的人工標註基準,用於文件級OCR和語義分類。在這個基準上,「Éclair」實現了最先進的準確性,優於其他方法在關鍵指標上的表現。此外,我們還在已建立的基準上評估了「Éclair」,展示了它在多個評估標準上的多樣性和強大性。
English
Optical Character Recognition (OCR) technology is widely used to extract text
from images of documents, facilitating efficient digitization and data
retrieval. However, merely extracting text is insufficient when dealing with
complex documents. Fully comprehending such documents requires an understanding
of their structure -- including formatting, formulas, tables, and the reading
order of multiple blocks and columns across multiple pages -- as well as
semantic information for detecting elements like footnotes and image captions.
This comprehensive understanding is crucial for downstream tasks such as
retrieval, document question answering, and data curation for training Large
Language Models (LLMs) and Vision Language Models (VLMs). To address this, we
introduce \'Eclair, a general-purpose text-extraction tool specifically
designed to process a wide range of document types. Given an image, \'Eclair is
able to extract formatted text in reading order, along with bounding boxes and
their corresponding semantic classes. To thoroughly evaluate these novel
capabilities, we introduce our diverse human-annotated benchmark for
document-level OCR and semantic classification. \'Eclair achieves
state-of-the-art accuracy on this benchmark, outperforming other methods across
key metrics. Additionally, we evaluate \'Eclair on established benchmarks,
demonstrating its versatility and strength across several evaluation standards.Summary
AI-Generated Summary