강화 미세 조정이 다중 모달 대형 언어 모델의 추론 능력을 강화한다

초록

2025년, 인공 일반 지능(AGI) 추구의 중요한 분기점에 서서, 강화 미세 조정(Reinforcement Fine-Tuning, RFT)은 대규모 언어 모델(LLM)의 추론 능력을 향상시키는 데 있어 상당한 잠재력을 입증했으며, OpenAI-o1 및 DeepSeek-R1과 같은 최첨단 AI 모델의 개발로 이어졌다. 또한, 다중 모달 대규모 언어 모델(MLLM)의 추론 능력을 강화하기 위한 RFT의 효율적인 적용은 커뮤니티로부터 폭넓은 관심을 끌었다. 본 포지션 페이퍼에서 우리는 강화 미세 조정이 다중 모달 대규모 언어 모델의 추론 능력을 강화한다는 주장을 펼친다. 먼저, 이 분야에 관심 있는 연구자들이 숙지해야 할 기본적인 배경 지식을 상세히 소개한다. 더 나아가, 우리는 RFT가 MLLM의 추론 능력을 강화하는 데 있어 이루어진 개선 사항을 다섯 가지 핵심 요소로 정리한다: 다양한 모달리티, 다양한 작업 및 도메인, 더 나은 훈련 알고리즘, 풍부한 벤치마크, 그리고 활발한 엔지니어링 프레임워크. 마지막으로, 커뮤니티가 고려할 수 있는 미래 연구를 위한 다섯 가지 유망한 방향을 제안한다. 우리는 이 포지션 페이퍼가 AGI로의 진전이라는 중대한 시점에서 커뮤니티에 가치 있는 통찰을 제공하기를 바란다. MLLM을 위한 RFT에 관한 작업 요약은 https://github.com/Sun-Haoyuan23/Awesome-RL-based-Reasoning-MLLms에서 확인할 수 있다.

English

Standing in 2025, at a critical juncture in the pursuit of Artificial General Intelligence (AGI), reinforcement fine-tuning (RFT) has demonstrated significant potential in enhancing the reasoning capability of large language models (LLMs) and has led to the development of cutting-edge AI models such as OpenAI-o1 and DeepSeek-R1. Moreover, the efficient application of RFT to enhance the reasoning capability of multimodal large language models (MLLMs) has attracted widespread attention from the community. In this position paper, we argue that reinforcement fine-tuning powers the reasoning capability of multimodal large language models. To begin with, we provide a detailed introduction to the fundamental background knowledge that researchers interested in this field should be familiar with. Furthermore, we meticulously summarize the improvements of RFT in powering reasoning capability of MLLMs into five key points: diverse modalities, diverse tasks and domains, better training algorithms, abundant benchmarks and thriving engineering frameworks. Finally, we propose five promising directions for future research that the community might consider. We hope that this position paper will provide valuable insights to the community at this pivotal stage in the advancement toward AGI. Summary of works done on RFT for MLLMs is available at https://github.com/Sun-Haoyuan23/Awesome-RL-based-Reasoning-MLLMs.

강화 미세 조정이 다중 모달 대형 언어 모델의 추론 능력을 강화한다

Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models

초록

Support