FS-DAG: 시각적으로 풍부한 문서 이해를 위한 Few Shot 도메인 적응 그래프 네트워크

초록

본 연구에서는 소량의 데이터로도 다양한 문서 유형에 적응할 수 있는 확장성과 효율성을 갖춘 Few Shot Domain Adapting Graph(FS-DAG) 모델 아키텍처를 제안합니다. FS-DAG는 모듈형 프레임워크 내에서 도메인 특화 및 언어/비전 특화 백본을 활용하여 시각적으로 풍부한 문서 이해(VRDU)를 위한 소수 샷 설정에서의 성능을 극대화합니다. 이 모델은 실제 배포에서 중요한 OCR 오류, 철자 오류, 도메인 변화와 같은 실질적인 문제에 강건하며, 9천만 개 미만의 매개변수로도 높은 성능을 발휘하여 계산 자원이 제한적인 정보 추출(IE) 작업에 적합합니다. FS-DAG의 성능은 정보 추출 작업에 대한 광범위한 실험을 통해 검증되었으며, 최신 기술과 비교하여 수렴 속도와 성능에서 상당한 개선을 보였습니다. 또한, 본 연구는 성능을 저하시키지 않으면서도 더 작고 효율적인 모델 개발의 지속적인 진전을 강조합니다. 코드: https://github.com/oracle-samples/fs-dag

English

In this work, we propose Few Shot Domain Adapting Graph (FS-DAG), a scalable and efficient model architecture for visually rich document understanding (VRDU) in few-shot settings. FS-DAG leverages domain-specific and language/vision specific backbones within a modular framework to adapt to diverse document types with minimal data. The model is robust to practical challenges such as handling OCR errors, misspellings, and domain shifts, which are critical in real-world deployments. FS-DAG is highly performant with less than 90M parameters, making it well-suited for complex real-world applications for Information Extraction (IE) tasks where computational resources are limited. We demonstrate FS-DAG's capability through extensive experiments for information extraction task, showing significant improvements in convergence speed and performance compared to state-of-the-art methods. Additionally, this work highlights the ongoing progress in developing smaller, more efficient models that do not compromise on performance. Code : https://github.com/oracle-samples/fs-dag

FS-DAG: 시각적으로 풍부한 문서 이해를 위한 Few Shot 도메인 적응 그래프 네트워크

FS-DAG: Few Shot Domain Adapting Graph Networks for Visually Rich Document Understanding

초록

Support