드리스티콘: 인도 문화에 대한 언어 모델의 이해를 평가하기 위한 다중모달 다국어 벤치마크

초록

우리는 인도 문화에 초점을 맞춘 최초의 다중모달 및 다국어 벤치마크인 DRISHTIKON을 소개한다. 이 벤치마크는 생성형 AI 시스템의 문화적 이해력을 평가하기 위해 설계되었다. 일반적이거나 글로벌 범위를 다루는 기존 벤치마크와 달리, DRISHTIKON은 인도의 다양한 지역에 걸쳐 깊이 있고 세밀한 커버리지를 제공하며, 15개 언어를 아우르고 모든 주와 연방 지역을 포함하며, 64,000개 이상의 정렬된 텍스트-이미지 쌍을 통합한다. 이 데이터셋은 축제, 의상, 요리, 예술 형태, 역사적 유산 등 풍부한 문화적 주제를 포착한다. 우리는 오픈소스 소형 및 대형 모델, 독점 시스템, 추론 전용 다중모달 모델, 인도에 초점을 맞춘 모델 등 다양한 비전-언어 모델(VLM)을 제로샷 및 사고 연쇄 설정에서 평가한다. 우리의 결과는 특히 저자원 언어와 덜 문서화된 전통에 대해 문화적으로 기반을 둔 다중모달 입력을 추론하는 현재 모델의 주요 한계를 드러낸다. DRISHTIKON은 포용적 AI 연구에서 중요한 공백을 메우며, 문화적으로 인식된 다중모달 언어 기술을 발전시키기 위한 강력한 테스트베드를 제공한다.

English

We introduce DRISHTIKON, a first-of-its-kind multimodal and multilingual benchmark centered exclusively on Indian culture, designed to evaluate the cultural understanding of generative AI systems. Unlike existing benchmarks with a generic or global scope, DRISHTIKON offers deep, fine-grained coverage across India's diverse regions, spanning 15 languages, covering all states and union territories, and incorporating over 64,000 aligned text-image pairs. The dataset captures rich cultural themes including festivals, attire, cuisines, art forms, and historical heritage amongst many more. We evaluate a wide range of vision-language models (VLMs), including open-source small and large models, proprietary systems, reasoning-specialized VLMs, and Indic-focused models, across zero-shot and chain-of-thought settings. Our results expose key limitations in current models' ability to reason over culturally grounded, multimodal inputs, particularly for low-resource languages and less-documented traditions. DRISHTIKON fills a vital gap in inclusive AI research, offering a robust testbed to advance culturally aware, multimodally competent language technologies.

드리스티콘: 인도 문화에 대한 언어 모델의 이해를 평가하기 위한 다중모달 다국어 벤치마크

DRISHTIKON: A Multimodal Multilingual Benchmark for Testing Language Models' Understanding on Indian Culture

초록

Support