MM-Eureka: ルールベース大規模強化学習による視覚的「アハ体験」の探求

要旨

我々は、大規模なルールベース強化学習（RL）をマルチモーダル推論に拡張することに成功したマルチモーダル推論モデル、MM-Eurekaを提案する。ルールベースRLはテキスト領域における大規模言語モデル（LLM）の推論能力向上において顕著な成功を収めてきたが、マルチモーダル環境への適用は依然として課題であった。本研究では、DeepSeek-R1のようなテキストベースRLシステムの主要な特性をマルチモーダル空間で再現し、精度報酬と応答長の着実な増加、およびリフレクション行動の出現を含む。我々は、教師ありファインチューニングなしで、ルールベースRLを通じて指示チューニング済みモデルと事前学習済みモデルの両方が強力なマルチモーダル推論能力を発達させ得ることを示し、代替手法と比較して優れたデータ効率性を示す。この分野のさらなる研究を促進するため、我々は完全なパイプラインをオープンソース化する。すべてのコード、モデル、データなどをhttps://github.com/ModalMinds/MM-EUREKAで公開する。

English

We present MM-Eureka, a multimodal reasoning model that successfully extends large-scale rule-based reinforcement learning (RL) to multimodal reasoning. While rule-based RL has shown remarkable success in improving LLMs' reasoning abilities in text domains, its application to multimodal settings has remained challenging. Our work reproduces key characteristics of text-based RL systems like DeepSeek-R1 in the multimodal space, including steady increases in accuracy reward and response length, and the emergence of reflection behaviors. We demonstrate that both instruction-tuned and pre-trained models can develop strong multimodal reasoning capabilities through rule-based RL without supervised fine-tuning, showing superior data efficiency compared to alternative approaches. We open-source our complete pipeline to foster further research in this area. We release all our codes, models, data, etc. at https://github.com/ModalMinds/MM-EUREKA

MM-Eureka: ルールベース大規模強化学習による視覚的「アハ体験」の探求

MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning

要旨

Support