MMRefine: マルチモーダル大規模言語モデルにおける堅牢な精緻化の障害を解明する

要旨

本論文では、マルチモーダル大規模言語モデル（MLLMs）の誤り修正能力を評価するためのマルチモーダルリファインメントベンチマーク「MMRefine」を紹介する。推論中の推論能力向上が重視される中、MMRefineは、単に修正前後の最終的な精度を比較するだけでなく、6つの異なるシナリオにわたる誤りの検出と修正能力を評価する枠組みを提供する。さらに、このベンチマークでは、誤りを6つの誤りタイプに分類して修正性能を分析する。様々なオープンおよびクローズドなMLLMsを用いた実験により、修正性能を阻害するボトルネックや要因が明らかとなり、効果的な推論能力向上のための改善点が浮き彫りとなった。我々のコードとデータセットは、https://github.com/naver-ai/MMRefine で公開されている。

English

This paper introduces MMRefine, a MultiModal Refinement benchmark designed to evaluate the error refinement capabilities of Multimodal Large Language Models (MLLMs). As the emphasis shifts toward enhancing reasoning during inference, MMRefine provides a framework that evaluates MLLMs' abilities to detect and correct errors across six distinct scenarios beyond just comparing final accuracy before and after refinement. Furthermore, the benchmark analyzes the refinement performance by categorizing errors into six error types. Experiments with various open and closed MLLMs reveal bottlenecks and factors impeding refinement performance, highlighting areas for improvement in effective reasoning enhancement. Our code and dataset are publicly available at https://github.com/naver-ai/MMRefine.

MMRefine: マルチモーダル大規模言語モデルにおける堅牢な精緻化の障害を解明する

MMRefine: Unveiling the Obstacles to Robust Refinement in Multimodal Large Language Models

要旨

Support