ChatPaper.aiChatPaper

Maya背后:构建多语言视觉语言模型

Behind Maya: Building a Multilingual Vision Language Model

May 13, 2025
作者: Nahid Alam, Karthik Reddy Kanjula, Surya Guthikonda, Timothy Chung, Bala Krishna S Vegesna, Abhipsha Das, Anthony Susevski, Ryan Sze-Yin Chan, S M Iftekhar Uddin, Shayekh Bin Islam, Roshan Santhosh, Snegha A, Drishti Sharma, Chen Liu, Isha Chaturvedi, Genta Indra Winata, Ashvanth. S, Snehanshu Mukherjee, Alham Fikri Aji
cs.AI

摘要

近期,大规模视觉-语言模型(VLMs)发展迅猛。这些模型在学术基准测试中展现了令人瞩目的成果,主要集中在广泛使用的语言上,但在低资源语言和多元文化背景下的表现则显不足。为应对这些局限,我们推出了Maya,一个开源的多语言视觉-语言模型。我们的贡献包括:1)基于LLaVA预训练数据集构建的八种语言的多语言图文预训练数据集;2)支持这些语言的多语言图文模型,旨在提升视觉-语言任务中的文化与语言理解能力。代码已发布于https://github.com/nahidalam/maya。
English
In recent times, we have seen a rapid development of large Vision-Language Models (VLMs). They have shown impressive results on academic benchmarks, primarily in widely spoken languages but lack performance on low-resource languages and varied cultural contexts. To address these limitations, we introduce Maya, an open-source Multilingual VLM. Our contributions are: 1) a multilingual image-text pretraining dataset in eight languages, based on the LLaVA pretraining dataset; and 2) a multilingual image-text model supporting these languages, enhancing cultural and linguistic comprehension in vision-language tasks. Code available at https://github.com/nahidalam/maya.

Summary

AI-Generated Summary

PDF12May 15, 2025