ChatPaper.aiChatPaper

俄语架构的多模态评估

Multimodal Evaluation of Russian-language Architectures

November 19, 2025
作者: Artem Chervyakov, Ulyana Isaeva, Anton Emelyanov, Artem Safin, Maria Tikhonova, Alexander Kharitonov, Yulia Lyakh, Petr Surovtsev, Denis Shevelev, Vildan Saburov, Vasily Konovalov, Elisei Rykov, Ivan Sviridov, Amina Miftakhova, Ilseyar Alimova, Alexander Panchenko, Alexander Kapitanov, Alena Fenogenova
cs.AI

摘要

当前,多模态大语言模型(MLLMs)已成为研究焦点,其规模与能力迅速提升,但其智能水平、局限性及风险仍未得到充分认知。针对这一问题,尤其在尚无多模态基准测试的俄语语境下,我们推出了Mera Multi——一个面向俄语架构的开放式多模态评估框架。该基准采用指令驱动模式,涵盖默认的文本、图像、音频和视频模态,包含18项全新构建的评估任务,既适用于通用模型,也适配特定模态架构(图像到文本、视频到文本及音频到文本)。我们的贡献包括:(i)建立多模态能力的统一分类体系;(ii)充分考虑俄罗斯文化语言特性,从头构建18个数据集并统一提示词与评估指标;(iii)提供闭源与开源模型的基线结果;(iv)制定防止基准泄露的方法论,包括私有数据集的水印技术与使用许可。虽然当前聚焦俄语,但本基准提出的方法论可复用于构建类型学多样语言(尤其是斯拉夫语族)的多模态评估体系。
English
Multimodal large language models (MLLMs) are currently at the center of research attention, showing rapid progress in scale and capabilities, yet their intelligence, limitations, and risks remain insufficiently understood. To address these issues, particularly in the context of the Russian language, where no multimodal benchmarks currently exist, we introduce Mera Multi, an open multimodal evaluation framework for Russian-spoken architectures. The benchmark is instruction-based and encompasses default text, image, audio, and video modalities, comprising 18 newly constructed evaluation tasks for both general-purpose models and modality-specific architectures (image-to-text, video-to-text, and audio-to-text). Our contributions include: (i) a universal taxonomy of multimodal abilities; (ii) 18 datasets created entirely from scratch with attention to Russian cultural and linguistic specificity, unified prompts, and metrics; (iii) baseline results for both closed-source and open-source models; (iv) a methodology for preventing benchmark leakage, including watermarking and licenses for private sets. While our current focus is on Russian, the proposed benchmark provides a replicable methodology for constructing multimodal benchmarks in typologically diverse languages, particularly within the Slavic language family.
PDF702November 28, 2025