MME-CoT：在大型多模態模型中對思維鏈進行基準測試，評估推理品質、韌性和效率。

摘要

透過思維鏈（Chain-of-Thought，CoT）回答問題已顯著增強大型語言模型（LLMs）的推理能力，然而其對大型多模型模型（LMMs）的影響仍缺乏系統性評估和深入研究。本文介紹了MME-CoT，一個專門評估LMMs的CoT推理表現的基準，涵蓋六個領域：數學、科學、OCR、邏輯、時空和一般場景。作為該領域的首個全面研究，我們提出了一套全面的評估套件，包括三個新穎的指標，評估推理質量、韌性和效率在細粒度水平上。通過精心挑選的高質量數據和獨特的評估策略，我們對最先進的LMMs進行了深入分析，揭示了幾個關鍵見解：1）具有反思機制的模型展現出優越的CoT質量，Kimi k1.5優於GPT-4o並展示了最高質量結果；2）CoT提示通常會降低LMM在感知密集任務上的表現，暗示可能存在有害的過度思考行為；以及3）儘管CoT質量很高，具有反思的LMMs在正常回應和自我修正階段均表現出顯著的低效率。我們希望MME-CoT成為推動LMMs多模態推理的基礎。專案頁面：https://mmecot.github.io/

English

Answering questions with Chain-of-Thought (CoT) has significantly enhanced the reasoning capabilities of Large Language Models (LLMs), yet its impact on Large Multimodal Models (LMMs) still lacks a systematic assessment and in-depth investigation. In this paper, we introduce MME-CoT, a specialized benchmark evaluating the CoT reasoning performance of LMMs, spanning six domains: math, science, OCR, logic, space-time, and general scenes. As the first comprehensive study in this area, we propose a thorough evaluation suite incorporating three novel metrics that assess the reasoning quality, robustness, and efficiency at a fine-grained level. Leveraging curated high-quality data and a unique evaluation strategy, we conduct an in-depth analysis of state-of-the-art LMMs, uncovering several key insights: 1) Models with reflection mechanism demonstrate a superior CoT quality, with Kimi k1.5 outperforming GPT-4o and demonstrating the highest quality results; 2) CoT prompting often degrades LMM performance on perception-heavy tasks, suggesting a potentially harmful overthinking behavior; and 3) Although the CoT quality is high, LMMs with reflection exhibit significant inefficiency in both normal response and self-correction phases. We hope MME-CoT serves as a foundation for advancing multimodal reasoning in LMMs. Project Page: https://mmecot.github.io/

MME-CoT：在大型多模態模型中對思維鏈進行基準測試，評估推理品質、韌性和效率。

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

摘要

Support