PaddleOCR-VL-1.5:打造面向複雜場景文檔解析的多任務0.9B視覺語言模型
PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing
January 29, 2026
作者: Cheng Cui, Ting Sun, Suyin Liang, Tingquan Gao, Zelun Zhang, Jiaxuan Liu, Xueqing Wang, Changda Zhou, Hongen Liu, Manhui Lin, Yue Zhang, Yubo Zhang, Yi Liu, Dianhai Yu, Yanjun Ma
cs.AI
摘要
我们正式推出PaddleOCR-VL-1.5模型,该升级版本在OmniDocBench v1.5数据集上以94.5%的准确率刷新了最优性能纪录。为系统评估模型对扫描畸变、倾斜变形、曲面扭曲、屏幕翻拍及光照变化等现实物理干扰的鲁棒性,我们提出了Real5-OmniDocBench基准测试集。实验结果表明,增强后的模型在这一新构建的基准测试中实现了最先进的性能表现。此外,我们通过引入印章识别与文本定位任务扩展了模型功能,同时保持其作为0.9B超紧凑视觉语言模型的高效特性。代码地址:https://github.com/PaddlePaddle/PaddleOCR
English
We introduce PaddleOCR-VL-1.5, an upgraded model achieving a new state-of-the-art (SOTA) accuracy of 94.5% on OmniDocBench v1.5. To rigorously evaluate robustness against real-world physical distortions, including scanning, skew, warping, screen-photography, and illumination, we propose the Real5-OmniDocBench benchmark. Experimental results demonstrate that this enhanced model attains SOTA performance on the newly curated benchmark. Furthermore, we extend the model's capabilities by incorporating seal recognition and text spotting tasks, while remaining a 0.9B ultra-compact VLM with high efficiency. Code: https://github.com/PaddlePaddle/PaddleOCR