ChatPaper.aiChatPaper

Qalam:用于阿拉伯光学字符和手写识别的多模态LLM

Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition

July 18, 2024
作者: Gagan Bhatia, El Moatez Billah Nagoudi, Fakhraddin Alwajih, Muhammad Abdul-Mageed
cs.AI

摘要

阿拉伯光学字符识别(OCR)和手写识别(HWR)由于阿拉伯文字的连续和上下文敏感特性而面临独特挑战。本研究介绍了Qalam,一个新颖的基础模型,专为阿拉伯OCR和HWR设计,采用了基于SwinV2编码器和RoBERTa解码器的架构。我们的模型显著优于现有方法,在HWR任务中实现了仅0.80%的词错误率(WER),在OCR任务中为1.18%。我们在多样化数据集上训练Qalam,包括来自阿拉伯手稿的超过450万张图像和一个包含60k图像文本对的合成数据集。值得注意的是,Qalam展现出对阿拉伯变音符的出色处理能力,这是阿拉伯文字中的关键特征。此外,它表现出处理高分辨率输入的显著能力,解决了当前OCR系统中的常见限制。这些进展突显了Qalam作为阿拉伯文字识别领域领先解决方案的潜力,提供了在准确性和效率方面的重大飞跃。
English
Arabic Optical Character Recognition (OCR) and Handwriting Recognition (HWR) pose unique challenges due to the cursive and context-sensitive nature of the Arabic script. This study introduces Qalam, a novel foundation model designed for Arabic OCR and HWR, built on a SwinV2 encoder and RoBERTa decoder architecture. Our model significantly outperforms existing methods, achieving a Word Error Rate (WER) of just 0.80% in HWR tasks and 1.18% in OCR tasks. We train Qalam on a diverse dataset, including over 4.5 million images from Arabic manuscripts and a synthetic dataset comprising 60k image-text pairs. Notably, Qalam demonstrates exceptional handling of Arabic diacritics, a critical feature in Arabic scripts. Furthermore, it shows a remarkable ability to process high-resolution inputs, addressing a common limitation in current OCR systems. These advancements underscore Qalam's potential as a leading solution for Arabic script recognition, offering a significant leap in accuracy and efficiency.

Summary

AI-Generated Summary

PDF1713November 28, 2024