OpenMMReasoner：以开放通用之道，拓展多模态推理的前沿疆域

摘要

近期，大规模推理模型的进展激发了将其能力扩展至多模态领域的广泛兴趣。然而，尽管在视觉推理方面取得了显著进步，缺乏透明且可复现的数据整理与训练策略仍是阻碍可扩展研究的主要障碍。本研究提出了OpenMMReasoner，一种完全透明的两阶段多模态推理方案，涵盖监督微调（SFT）和强化学习（RL）两个阶段。在SFT阶段，我们构建了一个包含87.4万样本的冷启动数据集，并进行了严格的逐步验证，为推理能力奠定了坚实基础。随后的RL阶段则利用一个跨多个领域的7.4万样本数据集，进一步磨砺并稳定这些能力，从而实现了更为稳健且高效的学习过程。大量评估表明，我们的训练方案不仅超越了强基线模型，还凸显了数据质量与训练设计在塑造多模态推理性能中的关键作用。值得注意的是，在九大多模态推理基准测试中，我们的方法相较于Qwen2.5-VL-7B-Instruct基线模型实现了11.6%的性能提升，为未来大规模多模态推理研究奠定了坚实的实证基础。我们已在https://github.com/EvolvingLMMs-Lab/OpenMMReasoner开源了所有代码、流程及数据。

English

Recent advancements in large reasoning models have fueled growing interest in extending such capabilities to multimodal domains. However, despite notable progress in visual reasoning, the lack of transparent and reproducible data curation and training strategies remains a major barrier to scalable research. In this work, we introduce OpenMMReasoner, a fully transparent two-stage recipe for multimodal reasoning spanning supervised fine-tuning (SFT) and reinforcement learning (RL). In the SFT stage, we construct an 874K-sample cold-start dataset with rigorous step-by-step validation, providing a strong foundation for reasoning capabilities. The subsequent RL stage leverages a 74K-sample dataset across diverse domains to further sharpen and stabilize these abilities, resulting in a more robust and efficient learning process. Extensive evaluations demonstrate that our training recipe not only surpasses strong baselines but also highlights the critical role of data quality and training design in shaping multimodal reasoning performance. Notably, our method achieves a 11.6% improvement over the Qwen2.5-VL-7B-Instruct baseline across nine multimodal reasoning benchmarks, establishing a solid empirical foundation for future large-scale multimodal reasoning research. We open-sourced all our codes, pipeline, and data at https://github.com/EvolvingLMMs-Lab/OpenMMReasoner.

OpenMMReasoner：以开放通用之道，拓展多模态推理的前沿疆域

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

摘要

Support