噪声感知训练布局感知语言模型
Noise-Aware Training of Layout-Aware Language Models
March 30, 2024
作者: Ritesh Sarkhel, Xiaoqi Ren, Lauro Beltrao Costa, Guolong Su, Vincent Perot, Yanan Xie, Emmanouil Koukoumidis, Arnab Nandi
cs.AI
摘要
视觉丰富的文档(VRD)利用视觉特征与语言线索来传播信息。训练一个从文档中识别命名实体的自定义提取器,需要大量目标文档类型的实例,这些实例需在文本和视觉模态上进行标注。这在企业场景中是一个昂贵的瓶颈,我们希望以可扩展的方式为成千上万种不同的文档类型训练自定义提取器。在目标文档类型的未标注实例上预训练提取器模型,随后在人工标注的实例上进行微调,在这些场景中并不适用,因为它超出了为提取器分配的最大允许训练时间。本文提出了一种噪声感知训练方法(Noise-Aware Training,简称NAT)来解决这一问题。NAT不依赖昂贵的人工标注文档,而是利用弱标注文档以可扩展的方式训练提取器。为避免因噪声、弱标注样本导致的模型质量下降,NAT估算每个训练样本的置信度,并在训练过程中将其作为不确定性度量纳入考量。我们使用NAT训练了多个最先进的提取器模型。在多个公开和内部数据集上的实验表明,NAT训练的模型不仅在性能上表现稳健——其宏F1分数比迁移学习基线高出最多6%,而且在标注效率上也更高——将获得同等性能所需的人工努力减少了最多73%。
English
A visually rich document (VRD) utilizes visual features along with linguistic
cues to disseminate information. Training a custom extractor that identifies
named entities from a document requires a large number of instances of the
target document type annotated at textual and visual modalities. This is an
expensive bottleneck in enterprise scenarios, where we want to train custom
extractors for thousands of different document types in a scalable way.
Pre-training an extractor model on unlabeled instances of the target document
type, followed by a fine-tuning step on human-labeled instances does not work
in these scenarios, as it surpasses the maximum allowable training time
allocated for the extractor. We address this scenario by proposing a
Noise-Aware Training method or NAT in this paper. Instead of acquiring
expensive human-labeled documents, NAT utilizes weakly labeled documents to
train an extractor in a scalable way. To avoid degradation in the model's
quality due to noisy, weakly labeled samples, NAT estimates the confidence of
each training sample and incorporates it as uncertainty measure during
training. We train multiple state-of-the-art extractor models using NAT.
Experiments on a number of publicly available and in-house datasets show that
NAT-trained models are not only robust in performance -- it outperforms a
transfer-learning baseline by up to 6% in terms of macro-F1 score, but it is
also more label-efficient -- it reduces the amount of human-effort required to
obtain comparable performance by up to 73%.Summary
AI-Generated Summary