利用注意力頭遮罩進行多模態文件分類的異分布檢測

摘要

在機器學習應用中，檢測異類（OOD）數據對於降低模型過度自信的風險至關重要，從而提高部署系統的可靠性和安全性。現有的大多數OOD檢測方法主要針對單模輸入，如圖像或文本。在多模態文檔的情況下，對這些方法在性能上的研究相對較少，這些方法主要專注於計算機視覺任務。我們提出了一種新的方法論，稱為注意力頭遮罩（AHM），用於文檔分類系統中的多模態OOD任務。我們的實證結果表明，所提出的AHM方法優於所有最先進的方法，與現有解決方案相比，顯著降低了假陽性率（FPR）高達7.5％。這種方法較好地泛化到多模態數據，如文檔，在這些數據中，視覺和文本信息在同一Transformer架構下建模。為解決高質量公開可用文檔數據集的稀缺性，並鼓勵進一步研究文檔OOD檢測，我們推出了一個名為FinanceDocs的新文檔人工智能數據集。我們的代碼和數據集已公開提供。

English

Detecting out-of-distribution (OOD) data is crucial in machine learning applications to mitigate the risk of model overconfidence, thereby enhancing the reliability and safety of deployed systems. The majority of existing OOD detection methods predominantly address uni-modal inputs, such as images or texts. In the context of multi-modal documents, there is a notable lack of extensive research on the performance of these methods, which have primarily been developed with a focus on computer vision tasks. We propose a novel methodology termed as attention head masking (AHM) for multi-modal OOD tasks in document classification systems. Our empirical results demonstrate that the proposed AHM method outperforms all state-of-the-art approaches and significantly decreases the false positive rate (FPR) compared to existing solutions up to 7.5\%. This methodology generalizes well to multi-modal data, such as documents, where visual and textual information are modeled under the same Transformer architecture. To address the scarcity of high-quality publicly available document datasets and encourage further research on OOD detection for documents, we introduce FinanceDocs, a new document AI dataset. Our code and dataset are publicly available.

利用注意力頭遮罩進行多模態文件分類的異分布檢測

Out-of-Distribution Detection with Attention Head Masking for Multimodal Document Classification

摘要

Support