OmniOCR：少数民族语言通用光学字符识别系统

摘要

随着深度学习和多模态模型的快速发展，光学字符识别（OCR）技术取得了长足进步，但现有方法大多聚焦于拉丁文、中文等资源丰富语种。由于文字系统复杂、标注数据稀缺、古今形态多样等因素，少数民族文字OCR研究仍处于探索不足的状态，导致在低资源或零样本场景下的泛化能力面临挑战。为此，我们提出面向少数民族文字的通用识别框架OmniOCR。该框架创新性地引入动态低秩自适应机制（Dynamic LoRA），通过跨层级和跨文字的动态容量分配，在保持原有知识的前提下实现高效适配。结合稀疏正则化技术修剪冗余参数更新，可在不增加推理成本的前提下实现紧凑高效的模型适应。在TibetanMNIST、水书、古彝文和东巴文数据集上的实验表明，OmniOCR在零样本基础模型和标准后训练方法中均取得最优效果，以卓越的参数效率达到当前最先进精度水平。与基线模型相比，在四个数据集上的识别准确率提升39%-66%。代码地址：https://github.com/AIGeeksGroup/OmniOCR。

English

Optical character recognition (OCR) has advanced rapidly with deep learning and multimodal models, yet most methods focus on well-resourced scripts such as Latin and Chinese. Ethnic minority languages remain underexplored due to complex writing systems, scarce annotations, and diverse historical and modern forms, making generalization in low-resource or zero-shot settings challenging. To address these challenges, we present OmniOCR, a universal framework for ethnic minority scripts. OmniOCR introduces Dynamic Low-Rank Adaptation (Dynamic LoRA) to allocate model capacity across layers and scripts, enabling effective adaptation while preserving knowledge.A sparsity regularization prunes redundant updates, ensuring compact and efficient adaptation without extra inference cost. Evaluations on TibetanMNIST, Shui, ancient Yi, and Dongba show that OmniOCR outperforms zero-shot foundation models and standard post training, achieving state-of-the-art accuracy with superior parameter efficiency, and compared with the state-of-the-art baseline models, it improves accuracy by 39%-66% on these four datasets. Code: https://github.com/AIGeeksGroup/OmniOCR.

OmniOCR：少数民族语言通用光学字符识别系统

OmniOCR: Generalist OCR for Ethnic Minority Languages

摘要

Support