強化微調提升多模態大型語言模型的推理能力

摘要

站在2025年，在追求人工通用智能（AGI）的關鍵時刻，強化微調（Reinforcement Fine-Tuning, RFT）已展現出顯著潛力，能夠提升大型語言模型（LLMs）的推理能力，並促成了如OpenAI-o1和DeepSeek-R1等尖端AI模型的發展。此外，RFT在多模態大型語言模型（MLLMs）中高效應用以增強其推理能力，也引起了學術界的廣泛關注。在本立場文件中，我們主張強化微調是驅動多模態大型語言模型推理能力的關鍵。首先，我們詳細介紹了對此領域感興趣的研究者應掌握的基本背景知識。接著，我們將RFT在提升MLLMs推理能力方面的改進精確總結為五大要點：多樣化的模態、多樣化的任務與領域、更優的訓練算法、豐富的基準測試以及蓬勃發展的工程框架。最後，我們提出了五個未來研究可能考慮的潛在方向。我們希望這份立場文件能在邁向AGI的關鍵階段，為學術界提供寶貴的見解。關於RFT應用於MLLMs的相關工作總結，可參見https://github.com/Sun-Haoyuan23/Awesome-RL-based-Reasoning-MLLMs。

English

Standing in 2025, at a critical juncture in the pursuit of Artificial General Intelligence (AGI), reinforcement fine-tuning (RFT) has demonstrated significant potential in enhancing the reasoning capability of large language models (LLMs) and has led to the development of cutting-edge AI models such as OpenAI-o1 and DeepSeek-R1. Moreover, the efficient application of RFT to enhance the reasoning capability of multimodal large language models (MLLMs) has attracted widespread attention from the community. In this position paper, we argue that reinforcement fine-tuning powers the reasoning capability of multimodal large language models. To begin with, we provide a detailed introduction to the fundamental background knowledge that researchers interested in this field should be familiar with. Furthermore, we meticulously summarize the improvements of RFT in powering reasoning capability of MLLMs into five key points: diverse modalities, diverse tasks and domains, better training algorithms, abundant benchmarks and thriving engineering frameworks. Finally, we propose five promising directions for future research that the community might consider. We hope that this position paper will provide valuable insights to the community at this pivotal stage in the advancement toward AGI. Summary of works done on RFT for MLLMs is available at https://github.com/Sun-Haoyuan23/Awesome-RL-based-Reasoning-MLLMs.

強化微調提升多模態大型語言模型的推理能力

Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models

摘要

Support