OmniOCR: 소수 민족 언어를 위한 범용 OCR

초록

딥러닝과 멀티모달 모델의 발전으로 광학 문자 인식(OCR) 기술이 빠르게 진보했으나, 대부분의 방법은 라틴 문자나 한자와 같은 자원이 풍부한 문자 체계에 집중되어 있다. 소수민족 언어는 복잡한 문자 체계, 부족한 주석 데이터, 역사적 및 현대적 형태의 다양성으로 인해 연구가 미흡한 실정이며, 이는 저자원 또는 제로샷 환경에서의 일반화를 어렵게 만든다. 이러한 문제를 해결하기 위해 본 논문은 소수민족 문자를 위한 범용 프레임워크인 OmniOCR를 제안한다. OmniOCR는 모델 용량을 계층 및 문자 체계 간에 동적으로 할당하는 Dynamic Low-Rank Adaptation(Dynamic LoRA)을 도입하여 기존 지식을 보존하면서 효과적인 적응을 가능하게 한다. 희소성 정규화는 중복 업데이트를 제거하여 추가 추론 비용 없이 간결하고 효율적인 적응을 보장한다. TibetanMNIST, Shui, 고대 Yi 및 Dongba 데이터셋에 대한 평가 결과, OmniOCR는 제로샷 기반 모델 및 표준 사후 학습을 능가하며 최첨단 정확도와 탁월한 매개변수 효율을 달성했고, 최신 기준 모델 대비 네 데이터셋에서 39%~66% 정확도 향상을 보였다. 코드: https://github.com/AIGeeksGroup/OmniOCR.

English

Optical character recognition (OCR) has advanced rapidly with deep learning and multimodal models, yet most methods focus on well-resourced scripts such as Latin and Chinese. Ethnic minority languages remain underexplored due to complex writing systems, scarce annotations, and diverse historical and modern forms, making generalization in low-resource or zero-shot settings challenging. To address these challenges, we present OmniOCR, a universal framework for ethnic minority scripts. OmniOCR introduces Dynamic Low-Rank Adaptation (Dynamic LoRA) to allocate model capacity across layers and scripts, enabling effective adaptation while preserving knowledge.A sparsity regularization prunes redundant updates, ensuring compact and efficient adaptation without extra inference cost. Evaluations on TibetanMNIST, Shui, ancient Yi, and Dongba show that OmniOCR outperforms zero-shot foundation models and standard post training, achieving state-of-the-art accuracy with superior parameter efficiency, and compared with the state-of-the-art baseline models, it improves accuracy by 39%-66% on these four datasets. Code: https://github.com/AIGeeksGroup/OmniOCR.

OmniOCR: 소수 민족 언어를 위한 범용 OCR

OmniOCR: Generalist OCR for Ethnic Minority Languages

초록

Support