MMRefine：揭示多模态大语言模型中稳健精炼的障碍

摘要

本文介紹了MMRefine，一個多模態精煉基準，旨在評估多模態大型語言模型（MLLMs）的錯誤精煉能力。隨著研究重點轉向推理過程中的能力提升，MMRefine提供了一個框架，不僅比較精煉前後的最終準確率，還評估MLLMs在六種不同情境下檢測和糾正錯誤的能力。此外，該基準通過將錯誤分類為六種錯誤類型來分析精煉性能。對多種開源和閉源MLLMs的實驗揭示了阻礙精煉性能的瓶頸和因素，突出了在有效推理增強方面的改進空間。我們的代碼和數據集公開於https://github.com/naver-ai/MMRefine。

English

This paper introduces MMRefine, a MultiModal Refinement benchmark designed to evaluate the error refinement capabilities of Multimodal Large Language Models (MLLMs). As the emphasis shifts toward enhancing reasoning during inference, MMRefine provides a framework that evaluates MLLMs' abilities to detect and correct errors across six distinct scenarios beyond just comparing final accuracy before and after refinement. Furthermore, the benchmark analyzes the refinement performance by categorizing errors into six error types. Experiments with various open and closed MLLMs reveal bottlenecks and factors impeding refinement performance, highlighting areas for improvement in effective reasoning enhancement. Our code and dataset are publicly available at https://github.com/naver-ai/MMRefine.

MMRefine：揭示多模态大语言模型中稳健精炼的障碍

MMRefine: Unveiling the Obstacles to Robust Refinement in Multimodal Large Language Models

摘要

Support