Maya背后:构建多语言视觉语言模型
Behind Maya: Building a Multilingual Vision Language Model
May 13, 2025
作者: Nahid Alam, Karthik Reddy Kanjula, Surya Guthikonda, Timothy Chung, Bala Krishna S Vegesna, Abhipsha Das, Anthony Susevski, Ryan Sze-Yin Chan, S M Iftekhar Uddin, Shayekh Bin Islam, Roshan Santhosh, Snegha A, Drishti Sharma, Chen Liu, Isha Chaturvedi, Genta Indra Winata, Ashvanth. S, Snehanshu Mukherjee, Alham Fikri Aji
cs.AI
摘要
近期,大规模视觉-语言模型(VLMs)发展迅猛。这些模型在学术基准测试中展现了令人瞩目的成果,主要集中在广泛使用的语言上,但在低资源语言和多元文化背景下的表现则显不足。为应对这些局限,我们推出了Maya,一个开源的多语言视觉-语言模型。我们的贡献包括:1)基于LLaVA预训练数据集构建的八种语言的多语言图文预训练数据集;2)支持这些语言的多语言图文模型,旨在提升视觉-语言任务中的文化与语言理解能力。代码已发布于https://github.com/nahidalam/maya。
English
In recent times, we have seen a rapid development of large Vision-Language
Models (VLMs). They have shown impressive results on academic benchmarks,
primarily in widely spoken languages but lack performance on low-resource
languages and varied cultural contexts. To address these limitations, we
introduce Maya, an open-source Multilingual VLM. Our contributions are: 1) a
multilingual image-text pretraining dataset in eight languages, based on the
LLaVA pretraining dataset; and 2) a multilingual image-text model supporting
these languages, enhancing cultural and linguistic comprehension in
vision-language tasks. Code available at https://github.com/nahidalam/maya.Summary
AI-Generated Summary