DeepVision-103K：面向多模态推理的视觉多样化、广覆盖且可验证的数学数据集

摘要

可验证奖励强化学习（RLVR）已被证明能有效增强大型多模态模型的视觉反思与推理能力。然而，现有数据集主要源自小规模人工构建或既有资源的重组，这限制了数据的多样性和覆盖范围，从而制约了模型性能的进一步提升。为此，我们推出DeepVision-103K——一个面向RLVR训练的综合性数据集，涵盖多样化的K12数学主题、广泛的知识点以及丰富的视觉元素。基于DeepVision训练的模型在多模态数学基准测试中表现优异，并能有效泛化至通用多模态推理任务。进一步分析表明，经过训练模型的视觉感知、反思与推理能力均得到增强，验证了DeepVision对推进多模态推理的有效性。数据地址：https://huggingface.co/datasets/skylenage/DeepVision-103K

English

Reinforcement Learning with Verifiable Rewards (RLVR) has been shown effective in enhancing the visual reflection and reasoning capabilities of Large Multimodal Models (LMMs). However, existing datasets are predominantly derived from either small-scale manual construction or recombination of prior resources, which limits data diversity and coverage, thereby constraining further gains in model performance. To this end, we introduce DeepVision-103K, a comprehensive dataset for RLVR training that covers diverse K12 mathematical topics, extensive knowledge points, and rich visual elements. Models trained on DeepVision achieve strong performance on multimodal mathematical benchmarks, and generalize effectively to general multimodal reasoning tasks. Further analysis reveals enhanced visual perception, reflection and reasoning capabilities in trained models, validating DeepVision's effectiveness for advancing multimodal reasoning. Data: https://huggingface.co/datasets/skylenage/DeepVision-103K{this url}.