DeepVision-103K：面向多模態推理的視覺多樣性、廣域覆蓋且可驗證的數學資料集

摘要

研究顯示，具可驗證獎勵的強化學習（RLVR）能有效增強大型多模態模型（LMM）的視覺反思與推理能力。然而，現有數據集主要源自小規模人工建構或既有資源的重組，限制了數據多樣性與覆蓋範圍，進而制約模型性能的進一步提升。為此，我們提出 DeepVision-103K——一個涵蓋多元 K12 數學主題、廣泛知識點及豐富視覺元素的 RLVR 綜合訓練數據集。基於 DeepVision 訓練的模型不僅在多模態數學基準測試中表現強勁，更能有效泛化至通用多模態推理任務。進一步分析顯示，訓練後的模型在視覺感知、反思與推理能力上均有提升，驗證了 DeepVision 對推進多模態推理的有效性。數據集地址：https://huggingface.co/datasets/skylenage/DeepVision-103K

English

Reinforcement Learning with Verifiable Rewards (RLVR) has been shown effective in enhancing the visual reflection and reasoning capabilities of Large Multimodal Models (LMMs). However, existing datasets are predominantly derived from either small-scale manual construction or recombination of prior resources, which limits data diversity and coverage, thereby constraining further gains in model performance. To this end, we introduce DeepVision-103K, a comprehensive dataset for RLVR training that covers diverse K12 mathematical topics, extensive knowledge points, and rich visual elements. Models trained on DeepVision achieve strong performance on multimodal mathematical benchmarks, and generalize effectively to general multimodal reasoning tasks. Further analysis reveals enhanced visual perception, reflection and reasoning capabilities in trained models, validating DeepVision's effectiveness for advancing multimodal reasoning. Data: https://huggingface.co/datasets/skylenage/DeepVision-103K{this url}.

DeepVision-103K：面向多模態推理的視覺多樣性、廣域覆蓋且可驗證的數學資料集

DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning

摘要

Support