OpenMMReasoner：以开放通用之道，拓展多模态推理的新疆界

摘要

近期，大型推理模型的進展激發了將此類能力擴展至多模態領域的廣泛興趣。然而，儘管在視覺推理方面取得了顯著進展，缺乏透明且可重現的數據整理與訓練策略仍是可擴展研究的主要障礙。在本研究中，我們提出了OpenMMReasoner，這是一個完全透明的兩階段多模態推理方案，涵蓋了監督微調（SFT）和強化學習（RL）。在SFT階段，我們構建了一個包含87.4萬個樣本的冷啟動數據集，並進行了嚴格的逐步驗證，為推理能力奠定了堅實基礎。隨後的RL階段利用了一個涵蓋多領域的7.4萬個樣本數據集，進一步磨礪並穩定這些能力，從而實現了更為穩健且高效的學習過程。廣泛的評估表明，我們的訓練方案不僅超越了強基線，還凸顯了數據質量和訓練設計在塑造多模態推理性能中的關鍵作用。值得注意的是，我們的方法在九個多模態推理基準測試中，相較於Qwen2.5-VL-7B-Instruct基線，實現了11.6%的提升，為未來大規模多模態推理研究奠定了堅實的實證基礎。我們已在https://github.com/EvolvingLMMs-Lab/OpenMMReasoner開源了所有代碼、流程和數據。

English

Recent advancements in large reasoning models have fueled growing interest in extending such capabilities to multimodal domains. However, despite notable progress in visual reasoning, the lack of transparent and reproducible data curation and training strategies remains a major barrier to scalable research. In this work, we introduce OpenMMReasoner, a fully transparent two-stage recipe for multimodal reasoning spanning supervised fine-tuning (SFT) and reinforcement learning (RL). In the SFT stage, we construct an 874K-sample cold-start dataset with rigorous step-by-step validation, providing a strong foundation for reasoning capabilities. The subsequent RL stage leverages a 74K-sample dataset across diverse domains to further sharpen and stabilize these abilities, resulting in a more robust and efficient learning process. Extensive evaluations demonstrate that our training recipe not only surpasses strong baselines but also highlights the critical role of data quality and training design in shaping multimodal reasoning performance. Notably, our method achieves a 11.6% improvement over the Qwen2.5-VL-7B-Instruct baseline across nine multimodal reasoning benchmarks, establishing a solid empirical foundation for future large-scale multimodal reasoning research. We open-sourced all our codes, pipeline, and data at https://github.com/EvolvingLMMs-Lab/OpenMMReasoner.

OpenMMReasoner：以开放通用之道，拓展多模态推理的新疆界

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

摘要

Support