PaddleOCR-VL-1.6:通过欠优化区域细化与渐进式后训练拓展文档解析的前沿
PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training
June 2, 2026
作者: Zelun Zhang, Hongen Liu, Suyin Liang, Yubo Zhang, Yiqing Xiang, Jiaxuan Liu, Ting Sun, Manhui Lin, Yue Zhang, Changda Zhou, Tingquan Gao, Cheng Cui, Yi Liu, Dianhai Yu, Yanjun Ma
cs.AI
摘要
我们介绍PaddleOCR-VL-1.6,这是基于PaddleOCR-VL-1.5升级的紧凑型文档解析模型。尽管PaddleOCR-VL-1.5建立了强大的0.9B基线,但其残留错误主要集中在欠优化区域,这些区域表现为模型行为不稳定、数据覆盖稀疏或监督信号不可靠。PaddleOCR-VL-1.6并非不加区分地扩展训练语料,而是引入了一种区域感知数据优化框架,从先前模型中识别薄弱区域,对这些区域进行针对性增强,并提升监督信号的可靠性。该模型进一步采用了基于精心数据选择和强化学习的渐进式后训练方案,通过分阶段优化将模型性能提升至更高水平。PaddleOCR-VL-1.6在OmniDocBench v1.6上取得了96.33%的最新最优分数,展现出与顶级VLM的强劲竞争力,并为PaddleOCR-VL系列提供了实用的后训练方案。
English
We introduce PaddleOCR-VL-1.6, an upgraded compact document parsing model built upon PaddleOCR-VL-1.5. Although PaddleOCR-VL-1.5 establishes a strong 0.9B baseline, its remaining errors concentrate in under-optimized regions where model behavior is unstable, data coverage is sparse, or supervision is unreliable. Rather than expanding the training corpus indiscriminately, PaddleOCR-VL-1.6 introduces a region-aware data optimization framework that identifies weak regions from the previous model, applies targeted enhancement to these regions, and improves the reliability of supervision signals. It further adopts a progressive post-training recipe based on curated data selection and reinforcement learning, pushing model performance to a higher level through staged optimization. PaddleOCR-VL-1.6 achieves a new state-of-the-art score of 96.33% on OmniDocBench v1.6, demonstrates strong competitiveness against top-tier VLMs, and provides a practical post-training recipe for the PaddleOCR-VL series.