Maya의 배경: 다국어 비전 언어 모델 구축

초록

최근 대규모 시각-언어 모델(VLMs)의 급속한 발전을 목격했습니다. 이러한 모델들은 주로 널리 사용되는 언어에서 학술 벤치마크에서 인상적인 결과를 보여주었지만, 저자원 언어와 다양한 문화적 맥락에서는 성능이 부족했습니다. 이러한 한계를 해결하기 위해, 우리는 오픈소스 다국어 VLM인 Maya를 소개합니다. 우리의 기여는 다음과 같습니다: 1) LLaVA 사전 학습 데이터셋을 기반으로 한 8개 언어의 다국어 이미지-텍스트 사전 학습 데이터셋; 그리고 2) 이러한 언어를 지원하는 다국어 이미지-텍스트 모델로, 시각-언어 작업에서 문화적 및 언어적 이해를 강화합니다. 코드는 https://github.com/nahidalam/maya에서 확인할 수 있습니다.

English

In recent times, we have seen a rapid development of large Vision-Language Models (VLMs). They have shown impressive results on academic benchmarks, primarily in widely spoken languages but lack performance on low-resource languages and varied cultural contexts. To address these limitations, we introduce Maya, an open-source Multilingual VLM. Our contributions are: 1) a multilingual image-text pretraining dataset in eight languages, based on the LLaVA pretraining dataset; and 2) a multilingual image-text model supporting these languages, enhancing cultural and linguistic comprehension in vision-language tasks. Code available at https://github.com/nahidalam/maya.

Maya의 배경: 다국어 비전 언어 모델 구축

Behind Maya: Building a Multilingual Vision Language Model

초록

Support