DocGraphLM:用於資訊提取的文件圖語言模型
DocGraphLM: Documental Graph Language Model for Information Extraction
January 5, 2024
作者: Dongsheng Wang, Zhiqiang Ma, Armineh Nourbakhsh, Kang Gu, Sameena Shah
cs.AI
摘要
在視覺豐富文件理解(VrDU)方面的進展已經實現了對具有複雜版面的文件進行信息提取和問答。出現了兩種架構的模式--受LLM啟發的基於Transformer的模型和圖神經網絡。在本文中,我們介紹了DocGraphLM,一個結合了預訓練語言模型和圖語義的新框架。為了實現這一目標,我們提出了1)一種聯合編碼器架構來表示文件,以及2)一種新的鏈接預測方法來重構文件圖。DocGraphLM使用一種收斂的聯合損失函數來預測節點之間的方向和距離,該函數優先考慮鄰域恢復並降低遠程節點檢測的權重。我們在三個最先進的數據集上進行的實驗表明,採用圖特徵在信息提取和問答任務上實現了一致的改進。此外,我們報告說,儘管僅通過鏈接預測構建,但採用圖特徵加速了訓練過程中的收斂。
English
Advances in Visually Rich Document Understanding (VrDU) have enabled
information extraction and question answering over documents with complex
layouts. Two tropes of architectures have emerged -- transformer-based models
inspired by LLMs, and Graph Neural Networks. In this paper, we introduce
DocGraphLM, a novel framework that combines pre-trained language models with
graph semantics. To achieve this, we propose 1) a joint encoder architecture to
represent documents, and 2) a novel link prediction approach to reconstruct
document graphs. DocGraphLM predicts both directions and distances between
nodes using a convergent joint loss function that prioritizes neighborhood
restoration and downweighs distant node detection. Our experiments on three
SotA datasets show consistent improvement on IE and QA tasks with the adoption
of graph features. Moreover, we report that adopting the graph features
accelerates convergence in the learning process during training, despite being
solely constructed through link prediction.