ChatPaper.aiChatPaper

WorldVQA:评估多模态大语言模型的原子世界知识掌握度

WorldVQA: Measuring Atomic World Knowledge in Multimodal Large Language Models

January 28, 2026
作者: Runjie Zhou, Youbo Shao, Haoyu Lu, Bowei Xing, Tongtong Bai, Yujie Chen, Jie Zhao, Lin Sui, Haotian Yao, Zijia Zhao, Hao Yang, Haoning Wu, Zaida Zhou, Jinguo Zhu, Zhiqi Huang, Yiping Bao, Yangyang Liu, Y. Charles, Xinyu Zhou
cs.AI

摘要

我们推出WorldVQA基准测试,旨在评估多模态大语言模型(MLLMs)的原子化视觉世界知识。与当前常将视觉知识检索与推理能力混为一谈的评估方式不同,WorldVQA通过解耦这两种能力来严格衡量“模型记忆的内容”。该基准测试通过分层分类体系(从常见头部类别对象到长尾稀有实体)评估模型对视觉实体的定位与命名能力。我们期望WorldVQA能成为视觉事实性的严格检验标准,从而为评估当前及下一代前沿模型的百科全书式知识广度与幻觉率建立规范。
English
We introduce WorldVQA, a benchmark designed to evaluate the atomic visual world knowledge of Multimodal Large Language Models (MLLMs). Unlike current evaluations, which often conflate visual knowledge retrieval with reasoning, WorldVQA decouples these capabilities to strictly measure "what the model memorizes." The benchmark assesses the atomic capability of grounding and naming visual entities across a stratified taxonomy, spanning from common head-class objects to long-tail rarities. We expect WorldVQA to serve as a rigorous test for visual factuality, thereby establishing a standard for assessing the encyclopedic breadth and hallucination rates of current and next-generation frontier models.
PDF50February 5, 2026