FS-DAG：面向视觉丰富文档理解的少样本领域自适应图网络

摘要

本研究提出了一种名为Few Shot Domain Adapting Graph (FS-DAG)的可扩展且高效的模型架构，专为少样本场景下的视觉丰富文档理解(VRDU)而设计。FS-DAG在模块化框架内结合了领域特定及语言/视觉特定的骨干网络，能够以最少的数据适应多种文档类型。该模型对实际应用中的挑战具有鲁棒性，如处理OCR错误、拼写错误及领域迁移等问题，这些在现实部署中至关重要。FS-DAG在参数少于9000万的情况下仍表现出色，特别适合计算资源有限的信息抽取(IE)任务等复杂实际应用。通过广泛的信息抽取任务实验，我们展示了FS-DAG在收敛速度和性能上相较于现有最先进方法的显著提升。此外，本研究还强调了在开发更小、更高效且不牺牲性能的模型方面所取得的持续进展。代码地址：https://github.com/oracle-samples/fs-dag

English

In this work, we propose Few Shot Domain Adapting Graph (FS-DAG), a scalable and efficient model architecture for visually rich document understanding (VRDU) in few-shot settings. FS-DAG leverages domain-specific and language/vision specific backbones within a modular framework to adapt to diverse document types with minimal data. The model is robust to practical challenges such as handling OCR errors, misspellings, and domain shifts, which are critical in real-world deployments. FS-DAG is highly performant with less than 90M parameters, making it well-suited for complex real-world applications for Information Extraction (IE) tasks where computational resources are limited. We demonstrate FS-DAG's capability through extensive experiments for information extraction task, showing significant improvements in convergence speed and performance compared to state-of-the-art methods. Additionally, this work highlights the ongoing progress in developing smaller, more efficient models that do not compromise on performance. Code : https://github.com/oracle-samples/fs-dag

FS-DAG：面向视觉丰富文档理解的少样本领域自适应图网络

FS-DAG: Few Shot Domain Adapting Graph Networks for Visually Rich Document Understanding

摘要

Support