Dallah:一种针对阿拉伯语的方言感知多模态大型语言模型
Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic
July 25, 2024
作者: Fakhraddin Alwajih, Gagan Bhatia, Muhammad Abdul-Mageed
cs.AI
摘要
最近的进展显著增强了多模态大型语言模型(MLLMs)在生成和理解图像到文本内容方面的能力。尽管取得了这些成功,但由于其他语言中高质量多模态资源的稀缺,进展主要局限于英语。这一限制阻碍了在阿拉伯语等语言中开发竞争性模型。为了缓解这种情况,我们引入了一款名为 Dallah 的高效阿拉伯语多模态助手,该助手利用基于LLaMA-2的先进语言模型促进多模态交互。Dallah 在阿拉伯语MLLMs中展示了最先进的性能。通过对六种阿拉伯方言进行微调,Dallah 展示了其处理包含文本和视觉元素的复杂方言交互的能力。该模型在两项基准测试中表现出色:一项评估其在现代标准阿拉伯(MSA)上的表现,另一项专门设计用于评估方言响应。除了在多模态交互任务中表现出色外,Dallah 还有潜力为方言感知的阿拉伯语MLLMs的进一步发展铺平道路。
English
Recent advancements have significantly enhanced the capabilities of
Multimodal Large Language Models (MLLMs) in generating and understanding
image-to-text content. Despite these successes, progress is predominantly
limited to English due to the scarcity of high quality multimodal resources in
other languages. This limitation impedes the development of competitive models
in languages such as Arabic. To alleviate this situation, we introduce an
efficient Arabic multimodal assistant, dubbed Dallah, that utilizes an advanced
language model based on LLaMA-2 to facilitate multimodal interactions. Dallah
demonstrates state-of-the-art performance in Arabic MLLMs. Through fine-tuning
six Arabic dialects, Dallah showcases its capability to handle complex
dialectal interactions incorporating both textual and visual elements. The
model excels in two benchmark tests: one evaluating its performance on Modern
Standard Arabic (MSA) and another specifically designed to assess dialectal
responses. Beyond its robust performance in multimodal interaction tasks,
Dallah has the potential to pave the way for further development of
dialect-aware Arabic MLLMs.Summary
AI-Generated Summary