PaddleOCR-VL-1.6:以欠優化區域精煉與漸進式後訓練拓展文件解析前沿
PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training
June 2, 2026
作者: Zelun Zhang, Hongen Liu, Suyin Liang, Yubo Zhang, Yiqing Xiang, Jiaxuan Liu, Ting Sun, Manhui Lin, Yue Zhang, Changda Zhou, Tingquan Gao, Cheng Cui, Yi Liu, Dianhai Yu, Yanjun Ma
cs.AI
摘要
我們介紹 PaddleOCR-VL-1.6,這是一款基於 PaddleOCR-VL-1.5 升級的緊湊型文件解析模型。儘管 PaddleOCR-VL-1.5 建立了強大的 0.9B 基線,但其殘留錯誤主要集中在模型行為不穩定、數據覆蓋稀疏或監督訊號不可靠的欠優化區域。PaddleOCR-VL-1.6 並未不加區分地擴展訓練語料庫,而是引入了一個區域感知的數據優化框架,從先前的模型中識別薄弱區域,對這些區域進行針對性增強,並改善監督訊號的可靠性。此外,該框架採用基於精心挑選的數據選擇與強化學習的漸進式後訓練策略,透過分階段優化將模型性能提升至更高水準。PaddleOCR-VL-1.6 在 OmniDocBench v1.6 上取得了 96.33% 的最新最佳成績,展現出與頂級視覺語言模型的強大競爭力,並為 PaddleOCR-VL 系列提供了實用的後訓練方案。
English
We introduce PaddleOCR-VL-1.6, an upgraded compact document parsing model built upon PaddleOCR-VL-1.5. Although PaddleOCR-VL-1.5 establishes a strong 0.9B baseline, its remaining errors concentrate in under-optimized regions where model behavior is unstable, data coverage is sparse, or supervision is unreliable. Rather than expanding the training corpus indiscriminately, PaddleOCR-VL-1.6 introduces a region-aware data optimization framework that identifies weak regions from the previous model, applies targeted enhancement to these regions, and improves the reliability of supervision signals. It further adopts a progressive post-training recipe based on curated data selection and reinforcement learning, pushing model performance to a higher level through staged optimization. PaddleOCR-VL-1.6 achieves a new state-of-the-art score of 96.33% on OmniDocBench v1.6, demonstrates strong competitiveness against top-tier VLMs, and provides a practical post-training recipe for the PaddleOCR-VL series.