PaddleOCR-VL-1.6: 未最適化領域リファインメントとプログレッシブ事後学習による文書解析のフロンティア拡大

要旨

PaddleOCR-VL-1.6を紹介します。これは、PaddleOCR-VL-1.5をベースに構築された、改良型のコンパクトな文書解析モデルです。PaddleOCR-VL-1.5は強力な0.9Bベースラインを確立していますが、残存するエラーは、モデルの動作が不安定、データカバレッジが疎、または教師信号の信頼性が低い、といった最適化が不十分な領域に集中しています。PaddleOCR-VL-1.6は、訓練コーパスを無差別に拡張するのではなく、前モデルから弱点領域を特定し、それらの領域に対して的を絞った強化を施し、教師信号の信頼性を向上させる、領域認識データ最適化フレームワークを導入します。さらに、厳選したデータ選択と強化学習に基づく漸進的事後学習レシピを採用し、段階的な最適化を通じてモデルの性能をより高い水準に押し上げます。PaddleOCR-VL-1.6は、OmniDocBench v1.6において新たな最先端スコア96.33%を達成し、トップクラスのVLMに対して強い競争力を示すとともに、PaddleOCR-VLシリーズ向けの実用的な事後学習レシピを提供します。

English

We introduce PaddleOCR-VL-1.6, an upgraded compact document parsing model built upon PaddleOCR-VL-1.5. Although PaddleOCR-VL-1.5 establishes a strong 0.9B baseline, its remaining errors concentrate in under-optimized regions where model behavior is unstable, data coverage is sparse, or supervision is unreliable. Rather than expanding the training corpus indiscriminately, PaddleOCR-VL-1.6 introduces a region-aware data optimization framework that identifies weak regions from the previous model, applies targeted enhancement to these regions, and improves the reliability of supervision signals. It further adopts a progressive post-training recipe based on curated data selection and reinforcement learning, pushing model performance to a higher level through staged optimization. PaddleOCR-VL-1.6 achieves a new state-of-the-art score of 96.33% on OmniDocBench v1.6, demonstrates strong competitiveness against top-tier VLMs, and provides a practical post-training recipe for the PaddleOCR-VL series.