利用注意力头蒙版进行多模态文档分类的外域检测
Out-of-Distribution Detection with Attention Head Masking for Multimodal Document Classification
August 20, 2024
作者: Christos Constantinou, Georgios Ioannides, Aman Chadha, Aaron Elkins, Edwin Simpson
cs.AI
摘要
在机器学习应用中,检测异常分布(OOD)数据对于降低模型过度自信的风险、增强部署系统的可靠性和安全性至关重要。现有大多数OOD检测方法主要针对单模态输入,如图像或文本。在多模态文档的背景下,对这些方法在性能上的研究相对缺乏,这些方法主要专注于计算机视觉任务的开发。我们提出了一种新的方法学,称为注意力头掩蔽(AHM),用于文档分类系统中的多模态OOD任务。我们的实证结果表明,所提出的AHM方法优于所有最先进方法,并与现有解决方案相比显著降低了误报率(FPR)高达7.5%。该方法在多模态数据(如文档)中具有良好的泛化能力,其中视觉和文本信息在同一Transformer架构下建模。为了解决高质量公开文档数据集的稀缺问题,鼓励进一步研究文档OOD检测,我们推出了FinanceDocs,一个新的文档人工智能数据集。我们的代码和数据集均已公开。
English
Detecting out-of-distribution (OOD) data is crucial in machine learning
applications to mitigate the risk of model overconfidence, thereby enhancing
the reliability and safety of deployed systems. The majority of existing OOD
detection methods predominantly address uni-modal inputs, such as images or
texts. In the context of multi-modal documents, there is a notable lack of
extensive research on the performance of these methods, which have primarily
been developed with a focus on computer vision tasks. We propose a novel
methodology termed as attention head masking (AHM) for multi-modal OOD tasks in
document classification systems. Our empirical results demonstrate that the
proposed AHM method outperforms all state-of-the-art approaches and
significantly decreases the false positive rate (FPR) compared to existing
solutions up to 7.5\%. This methodology generalizes well to multi-modal data,
such as documents, where visual and textual information are modeled under the
same Transformer architecture. To address the scarcity of high-quality publicly
available document datasets and encourage further research on OOD detection for
documents, we introduce FinanceDocs, a new document AI dataset. Our code and
dataset are publicly available.Summary
AI-Generated Summary