FS-DAG：面向视觉丰富文档理解的少样本领域自适应图网络

摘要

在本研究中，我們提出了少樣本領域適應圖模型（Few Shot Domain Adapting Graph, FS-DAG），這是一種可擴展且高效的模型架構，專為視覺豐富文件理解（VRDU）在少樣本設定下的應用而設計。FS-DAG在模組化框架內利用領域特定及語言/視覺特定的骨幹網絡，以極少的數據適應多樣化的文件類型。該模型對於處理OCR錯誤、拼寫錯誤及領域轉移等實際挑戰具有魯棒性，這些挑戰在現實世界部署中至關重要。FS-DAG在參數量少於9000萬的情況下仍表現出色，非常適合於計算資源受限的複雜現實世界信息抽取（IE）任務。我們通過廣泛的信息抽取任務實驗展示了FS-DAG的能力，與當前最先進的方法相比，在收斂速度和性能上均有顯著提升。此外，本研究強調了在開發更小、更高效且不犧牲性能的模型方面所取得的持續進展。代碼請見：https://github.com/oracle-samples/fs-dag

English

In this work, we propose Few Shot Domain Adapting Graph (FS-DAG), a scalable and efficient model architecture for visually rich document understanding (VRDU) in few-shot settings. FS-DAG leverages domain-specific and language/vision specific backbones within a modular framework to adapt to diverse document types with minimal data. The model is robust to practical challenges such as handling OCR errors, misspellings, and domain shifts, which are critical in real-world deployments. FS-DAG is highly performant with less than 90M parameters, making it well-suited for complex real-world applications for Information Extraction (IE) tasks where computational resources are limited. We demonstrate FS-DAG's capability through extensive experiments for information extraction task, showing significant improvements in convergence speed and performance compared to state-of-the-art methods. Additionally, this work highlights the ongoing progress in developing smaller, more efficient models that do not compromise on performance. Code : https://github.com/oracle-samples/fs-dag

FS-DAG：面向视觉丰富文档理解的少样本领域自适应图网络

FS-DAG: Few Shot Domain Adapting Graph Networks for Visually Rich Document Understanding

摘要

Support