DRISHTIKON:一個多模態多語種的基準測試,用於評估語言模型對印度文化的理解能力
DRISHTIKON: A Multimodal Multilingual Benchmark for Testing Language Models' Understanding on Indian Culture
September 23, 2025
作者: Arijit Maji, Raghvendra Kumar, Akash Ghosh, Anushka, Nemil Shah, Abhilekh Borah, Vanshika Shah, Nishant Mishra, Sriparna Saha
cs.AI
摘要
我们推出DRISHTIKON,这是一项首创的多模态、多语言基准测试,专注于印度文化,旨在评估生成式人工智能系统的文化理解能力。与现有具有通用性或全球视野的基准测试不同,DRISHTIKON提供了对印度多样地区深入且细致的覆盖,涵盖15种语言,覆盖所有邦和联邦属地,并整合了超过64,000组对齐的文本-图像对。该数据集捕捉了丰富的文化主题,包括节日、服饰、美食、艺术形式及历史遗产等众多方面。我们评估了广泛的视觉-语言模型(VLMs),包括开源的小型和大型模型、专有系统、专门用于推理的VLMs以及专注于印度语言的模型,在零样本和思维链设置下进行测试。我们的结果揭示了当前模型在处理基于文化的多模态输入,特别是低资源语言和较少文献记载的传统时,存在关键局限性。DRISHTIKON填补了包容性人工智能研究中的一个重要空白,为推进具有文化意识、多模态能力的语言技术提供了一个强有力的测试平台。
English
We introduce DRISHTIKON, a first-of-its-kind multimodal and multilingual
benchmark centered exclusively on Indian culture, designed to evaluate the
cultural understanding of generative AI systems. Unlike existing benchmarks
with a generic or global scope, DRISHTIKON offers deep, fine-grained coverage
across India's diverse regions, spanning 15 languages, covering all states and
union territories, and incorporating over 64,000 aligned text-image pairs. The
dataset captures rich cultural themes including festivals, attire, cuisines,
art forms, and historical heritage amongst many more. We evaluate a wide range
of vision-language models (VLMs), including open-source small and large models,
proprietary systems, reasoning-specialized VLMs, and Indic-focused models,
across zero-shot and chain-of-thought settings. Our results expose key
limitations in current models' ability to reason over culturally grounded,
multimodal inputs, particularly for low-resource languages and less-documented
traditions. DRISHTIKON fills a vital gap in inclusive AI research, offering a
robust testbed to advance culturally aware, multimodally competent language
technologies.