ChatPaper.aiChatPaper

PaddleOCR-VL-1.5:构建面向复杂场景文档解析的9亿参数多任务视觉语言模型

PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing

January 29, 2026
作者: Cheng Cui, Ting Sun, Suyin Liang, Tingquan Gao, Zelun Zhang, Jiaxuan Liu, Xueqing Wang, Changda Zhou, Hongen Liu, Manhui Lin, Yue Zhang, Yubo Zhang, Yi Liu, Dianhai Yu, Yanjun Ma
cs.AI

摘要

我们推出升级版模型PaddleOCR-VL-1.5,该模型在OmniDocBench v1.5数据集上以94.5%的准确率刷新业界最优(SOTA)纪录。为系统评估模型对扫描畸变、倾斜、弯曲、屏幕翻拍及光照变化等真实物理形变的鲁棒性,我们提出Real5-OmniDocBench基准测试。实验结果表明,增强后的模型在新构建的基准测试中均达到SOTA性能。此外,我们通过引入印章识别与文本检测任务扩展模型能力,同时保持其作为0.9B超紧凑视觉语言模型的高效特性。代码地址:https://github.com/PaddlePaddle/PaddleOCR
English
We introduce PaddleOCR-VL-1.5, an upgraded model achieving a new state-of-the-art (SOTA) accuracy of 94.5% on OmniDocBench v1.5. To rigorously evaluate robustness against real-world physical distortions, including scanning, skew, warping, screen-photography, and illumination, we propose the Real5-OmniDocBench benchmark. Experimental results demonstrate that this enhanced model attains SOTA performance on the newly curated benchmark. Furthermore, we extend the model's capabilities by incorporating seal recognition and text spotting tasks, while remaining a 0.9B ultra-compact VLM with high efficiency. Code: https://github.com/PaddlePaddle/PaddleOCR
PDF92February 3, 2026