ChatPaper.aiChatPaper

MEENA(波斯语MMMU):面向N级评估的多模态多语言教育考试

MEENA (PersianMMMU): Multimodal-Multilingual Educational Exams for N-level Assessment

August 24, 2025
作者: Omid Ghahroodi, Arshia Hemmat, Marzia Nouri, Seyed Mohammad Hadi Hosseini, Doratossadat Dastgheib, Mohammad Vali Sanian, Alireza Sahebi, Reihaneh Zohrabi, Mohammad Hossein Rohban, Ehsaneddin Asgari, Mahdieh Soleymani Baghshah
cs.AI

摘要

近期,大规模视觉语言模型(VLMs)的进展主要集中在英语领域,对其他语言的关注相对有限。为填补这一空白,我们推出了MEENA(亦称波斯MMMU),这是首个专为评估波斯语VLMs在科学、推理及人类理解任务上的表现而设计的数据集。该数据集包含约7,500道波斯语及3,000道英语问题,广泛覆盖推理、数学、物理、图表解析以及波斯艺术与文学等多个主题。MEENA的核心特色包括:(1)多样化的学科内容,涵盖从小学到高中的各个教育阶段;(2)丰富的元数据,如难度等级与详细解答;(3)原创波斯语数据,保留文化细微差异;(4)双语结构,用于评估跨语言表现;(5)一系列多样化实验,评估包括整体性能、模型对图像的关注度及其产生幻觉倾向在内的多项能力。我们期望这一基准能为提升VLMs在非英语领域的能力做出贡献。
English
Recent advancements in large vision-language models (VLMs) have primarily focused on English, with limited attention given to other languages. To address this gap, we introduce MEENA (also known as PersianMMMU), the first dataset designed to evaluate Persian VLMs across scientific, reasoning, and human-level understanding tasks. Our dataset comprises approximately 7,500 Persian and 3,000 English questions, covering a wide range of topics such as reasoning, mathematics, physics, diagrams, charts, and Persian art and literature. Key features of MEENA include: (1) diverse subject coverage spanning various educational levels, from primary to upper secondary school, (2) rich metadata, including difficulty levels and descriptive answers, (3) original Persian data that preserves cultural nuances, (4) a bilingual structure to assess cross-linguistic performance, and (5) a series of diverse experiments assessing various capabilities, including overall performance, the model's ability to attend to images, and its tendency to generate hallucinations. We hope this benchmark contributes to enhancing VLM capabilities beyond English.
PDF61August 26, 2025