MMR-Life：拼湊現實生活場景的多模態多圖像推理系統

摘要

近年來，多模態大型語言模型（MLLMs）在推理能力方面的進展，使其能夠處理更複雜的任務，如科學分析和數學推理。儘管前景可期，MLLMs在現實生活不同場景中的推理能力仍鮮有探索，且缺乏標準化的評估基準。為填補此空白，我們提出 MMR-Life——一個專為評估 MLLMs 在真實生活場景中多樣化多模態多圖像推理能力而設計的綜合基準。MMR-Life 包含 2,646 道基於 19,108 張主要源自真實世界情境圖像的選擇題，全面涵蓋七種推理類型：溯因、類比、因果、演繹、歸納、空間與時間推理。與現有推理基準不同，MMR-Life 不依賴領域專業知識，而是要求模型整合多圖像資訊並運用多元推理能力。對 37 個先進模型的評估結果顯示，MMR-Life 帶來顯著挑戰：即使頂尖模型如 GPT-5 也僅達到 58% 的準確率，且在不同推理類型間表現差異懸殊。此外，我們分析現有 MLLMs 的推理模式，探討思考長度、推理方法與推理類型等因素如何影響其表現。總體而言，MMR-Life 為評估、分析與改進下一代多模態推理系統奠定了全面基礎。

English

Recent progress in the reasoning capabilities of multimodal large language models (MLLMs) has empowered them to address more complex tasks such as scientific analysis and mathematical reasoning. Despite their promise, MLLMs' reasoning abilities across different scenarios in real life remain largely unexplored and lack standardized benchmarks for evaluation. To address this gap, we introduce MMR-Life, a comprehensive benchmark designed to evaluate the diverse multimodal multi-image reasoning capabilities of MLLMs across real-life scenarios. MMR-Life consists of 2,646 multiple-choice questions based on 19,108 images primarily sourced from real-world contexts, comprehensively covering seven reasoning types: abductive, analogical, causal, deductive, inductive, spatial, and temporal. Unlike existing reasoning benchmarks, MMR-Life does not rely on domain-specific expertise but instead requires models to integrate information across multiple images and apply diverse reasoning abilities. The evaluation of 37 advanced models highlights the substantial challenge posed by MMR-Life. Even top models like GPT-5 achieve only 58% accuracy and display considerable variance in performance across reasoning types. Moreover, we analyze the reasoning paradigms of existing MLLMs, exploring how factors such as thinking length, reasoning method, and reasoning type affect their performance. In summary, MMR-Life establishes a comprehensive foundation for evaluating, analyzing, and improving the next generation of multimodal reasoning systems.

MMR-Life：拼湊現實生活場景的多模態多圖像推理系統

MMR-Life: Piecing Together Real-life Scenes for Multimodal Multi-image Reasoning

摘要

Support