MMRefine: 다중모달 대규모 언어 모델의 견고한 정제를 가로막는 장애 요인 분석

초록

본 논문은 멀티모달 대형 언어 모델(MLLMs)의 오류 정제 능력을 평가하기 위해 설계된 MMRefine(MultiModal Refinement) 벤치마크를 소개합니다. 추론 과정에서의 추론 능력 강조가 중요해짐에 따라, MMRefine은 단순히 정제 전후의 최종 정확도를 비교하는 것을 넘어, 6가지 서로 다른 시나리오에서 MLLMs의 오류 탐지 및 수정 능력을 평가하는 프레임워크를 제공합니다. 또한, 이 벤치마크는 오류를 6가지 유형으로 분류하여 정제 성능을 분석합니다. 다양한 오픈 및 클로즈드 MLLMs를 대상으로 한 실험을 통해 정제 성능을 저해하는 병목 현상과 요인들을 밝혀내고, 효과적인 추론 강화를 위한 개선 방향을 제시합니다. 저희의 코드와 데이터셋은 https://github.com/naver-ai/MMRefine에서 공개되어 있습니다.

English

This paper introduces MMRefine, a MultiModal Refinement benchmark designed to evaluate the error refinement capabilities of Multimodal Large Language Models (MLLMs). As the emphasis shifts toward enhancing reasoning during inference, MMRefine provides a framework that evaluates MLLMs' abilities to detect and correct errors across six distinct scenarios beyond just comparing final accuracy before and after refinement. Furthermore, the benchmark analyzes the refinement performance by categorizing errors into six error types. Experiments with various open and closed MLLMs reveal bottlenecks and factors impeding refinement performance, highlighting areas for improvement in effective reasoning enhancement. Our code and dataset are publicly available at https://github.com/naver-ai/MMRefine.

MMRefine: 다중모달 대규모 언어 모델의 견고한 정제를 가로막는 장애 요인 분석

MMRefine: Unveiling the Obstacles to Robust Refinement in Multimodal Large Language Models

초록

Support