OmniOCR：少数民族言語向け汎用OCR

要旨

光学文字認識（OCR）技術は、深層学習とマルチモーダルモデルの発展により急速に進歩しているが、その大半の手法はラテン文字や漢字といったリソース豊富な文字体系に焦点を当てている。少数民族言語は、複雑な書記体系、注釈データの不足、歴史的・現代的変種の多様性により十分に研究が進んでおらず、低リソースやゼロショット設定での汎化が困難な課題となっている。これらの課題に対処するため、本論文では少数民族文字向けの汎用フレームワークOmniOCRを提案する。OmniOCRはDynamic Low-Rank Adaptation（Dynamic LoRA）を導入し、モデル容量を層と文字体系間で動的に配分することで、知識を保持しつつ効果的な適応を実現する。スパース性正則化により冗長な更新を剪定し、推論コストを増加させることなくコンパクトで効率的な適応を保証する。TibetanMNIST、水書、古彝文字、東巴文字による評価では、OmniOCRがゼロショット基盤モデルや標準的な事後学習を上回り、優れたパラメータ効率で最高精度を達成し、現状最高のベースラインモデルと比較してこれら4データセットで39%～66%の精度向上を実現した。コード：https://github.com/AIGeeksGroup/OmniOCR。

English

Optical character recognition (OCR) has advanced rapidly with deep learning and multimodal models, yet most methods focus on well-resourced scripts such as Latin and Chinese. Ethnic minority languages remain underexplored due to complex writing systems, scarce annotations, and diverse historical and modern forms, making generalization in low-resource or zero-shot settings challenging. To address these challenges, we present OmniOCR, a universal framework for ethnic minority scripts. OmniOCR introduces Dynamic Low-Rank Adaptation (Dynamic LoRA) to allocate model capacity across layers and scripts, enabling effective adaptation while preserving knowledge.A sparsity regularization prunes redundant updates, ensuring compact and efficient adaptation without extra inference cost. Evaluations on TibetanMNIST, Shui, ancient Yi, and Dongba show that OmniOCR outperforms zero-shot foundation models and standard post training, achieving state-of-the-art accuracy with superior parameter efficiency, and compared with the state-of-the-art baseline models, it improves accuracy by 39%-66% on these four datasets. Code: https://github.com/AIGeeksGroup/OmniOCR.

OmniOCR：少数民族言語向け汎用OCR

OmniOCR: Generalist OCR for Ethnic Minority Languages

要旨

Support