ChatPaper.aiChatPaper

Qalam:用於阿拉伯文光學字符和手寫識別的多模態LLM

Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition

July 18, 2024
作者: Gagan Bhatia, El Moatez Billah Nagoudi, Fakhraddin Alwajih, Muhammad Abdul-Mageed
cs.AI

摘要

阿拉伯文本字符識別(OCR)和手寫識別(HWR)由於阿拉伯文字的連筆和上下文敏感性而面臨獨特挑戰。本研究介紹了Qalam,一個新穎的基礎模型,專為阿拉伯文本OCR和HWR而設計,採用了SwinV2編碼器和RoBERTa解碼器架構。我們的模型明顯優於現有方法,在HWR任務中達到僅0.80%的字錯誤率(WER),在OCR任務中為1.18%。我們在多樣化數據集上訓練Qalam,包括來自阿拉伯手稿的超過450萬張圖像和包含60k圖像文本對的合成數據集。值得注意的是,Qalam展示了對阿拉伯文本音標的出色處理能力,這是阿拉伯文字中的一個關鍵特徵。此外,它表現出對高分辨率輸入的卓越處理能力,解決了當前OCR系統中的一個常見限制。這些進步突顯了Qalam作為阿拉伯文字識別領域領先解決方案的潛力,提供了準確性和效率方面的重大飛躍。
English
Arabic Optical Character Recognition (OCR) and Handwriting Recognition (HWR) pose unique challenges due to the cursive and context-sensitive nature of the Arabic script. This study introduces Qalam, a novel foundation model designed for Arabic OCR and HWR, built on a SwinV2 encoder and RoBERTa decoder architecture. Our model significantly outperforms existing methods, achieving a Word Error Rate (WER) of just 0.80% in HWR tasks and 1.18% in OCR tasks. We train Qalam on a diverse dataset, including over 4.5 million images from Arabic manuscripts and a synthetic dataset comprising 60k image-text pairs. Notably, Qalam demonstrates exceptional handling of Arabic diacritics, a critical feature in Arabic scripts. Furthermore, it shows a remarkable ability to process high-resolution inputs, addressing a common limitation in current OCR systems. These advancements underscore Qalam's potential as a leading solution for Arabic script recognition, offering a significant leap in accuracy and efficiency.

Summary

AI-Generated Summary

PDF1713November 28, 2024