UniHDSA:一種用於層次化文檔結構分析的統一關係預測方法
UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure Analysis
March 20, 2025
作者: Jiawei Wang, Kai Hu, Qiang Huo
cs.AI
摘要
文件結構分析,亦稱文件佈局分析,對於理解文件的物理佈局與邏輯結構至關重要,服務於信息檢索、文件摘要、知識提取等領域。層次化文件結構分析(HDSA)特別旨在恢復使用具有層次化架構的創作軟件創建的文件之層次結構。以往的研究主要遵循兩種途徑:一種專注於孤立地解決HDSA的特定子任務,如表格檢測或閱讀順序預測;另一種則採用統一框架,利用多個分支或模塊,每個模塊設計用於處理不同的任務。在本研究中,我們提出了一種針對HDSA的統一關係預測方法,名為UniHDSA,該方法將各種HDSA子任務視為關係預測問題,並將關係預測標籤整合到一個統一的標籤空間中。這使得單一的關係預測模塊能夠同時處理多個任務,無論是在頁面級別還是文件級別的結構分析中。為了驗證UniHDSA的有效性,我們開發了一個基於Transformer架構的多模態端到端系統。大量的實驗結果表明,我們的方法在層次化文件結構分析基準Comp-HRDoc上達到了最先進的性能,並在大規模文件佈局分析數據集DocLayNet上取得了競爭力的成績,有效展示了我們方法在所有子任務上的優越性。Comp-HRDoc基準和UniHDSA的配置已公開於https://github.com/microsoft/CompHRDoc。
English
Document structure analysis, aka document layout analysis, is crucial for
understanding both the physical layout and logical structure of documents,
serving information retrieval, document summarization, knowledge extraction,
etc. Hierarchical Document Structure Analysis (HDSA) specifically aims to
restore the hierarchical structure of documents created using authoring
software with hierarchical schemas. Previous research has primarily followed
two approaches: one focuses on tackling specific subtasks of HDSA in isolation,
such as table detection or reading order prediction, while the other adopts a
unified framework that uses multiple branches or modules, each designed to
address a distinct task. In this work, we propose a unified relation prediction
approach for HDSA, called UniHDSA, which treats various HDSA sub-tasks as
relation prediction problems and consolidates relation prediction labels into a
unified label space. This allows a single relation prediction module to handle
multiple tasks simultaneously, whether at a page-level or document-level
structure analysis. To validate the effectiveness of UniHDSA, we develop a
multimodal end-to-end system based on Transformer architectures. Extensive
experimental results demonstrate that our approach achieves state-of-the-art
performance on a hierarchical document structure analysis benchmark,
Comp-HRDoc, and competitive results on a large-scale document layout analysis
dataset, DocLayNet, effectively illustrating the superiority of our method
across all sub-tasks. The Comp-HRDoc benchmark and UniHDSA's configurations are
publicly available at https://github.com/microsoft/CompHRDoc.Summary
AI-Generated Summary