Dallah:一個針對阿拉伯語方言的多模式大型語言模型
Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic
July 25, 2024
作者: Fakhraddin Alwajih, Gagan Bhatia, Muhammad Abdul-Mageed
cs.AI
摘要
最近的進展顯著增強了多模式大型語言模型(MLLMs)在生成和理解圖像至文字內容方面的能力。儘管取得了這些成功,進展主要受限於英語,因為其他語言中高質量多模式資源的稀缺。這種限制阻礙了在阿拉伯語等語言中發展具競爭力模型。為了緩解這種情況,我們引入了一個高效的阿拉伯語多模式助手,名為 Dallah,它利用基於LLaMA-2的先進語言模型來促進多模式交互作用。Dallah 在阿拉伯語MLLMs中展示了最先進的性能。通過微調六種阿拉伯方言,Dallah 展示了其處理融合文本和視覺元素的複雜方言交互作用的能力。該模型在兩個基準測試中表現卓越:一個評估其在現代標準阿拉伯語(MSA)上的表現,另一個專門設計來評估方言回應。除了在多模式交互任務中表現出色外,Dallah 還有潛力為進一步發展具方言意識的阿拉伯語MLLMs鋪平道路。
English
Recent advancements have significantly enhanced the capabilities of
Multimodal Large Language Models (MLLMs) in generating and understanding
image-to-text content. Despite these successes, progress is predominantly
limited to English due to the scarcity of high quality multimodal resources in
other languages. This limitation impedes the development of competitive models
in languages such as Arabic. To alleviate this situation, we introduce an
efficient Arabic multimodal assistant, dubbed Dallah, that utilizes an advanced
language model based on LLaMA-2 to facilitate multimodal interactions. Dallah
demonstrates state-of-the-art performance in Arabic MLLMs. Through fine-tuning
six Arabic dialects, Dallah showcases its capability to handle complex
dialectal interactions incorporating both textual and visual elements. The
model excels in two benchmark tests: one evaluating its performance on Modern
Standard Arabic (MSA) and another specifically designed to assess dialectal
responses. Beyond its robust performance in multimodal interaction tasks,
Dallah has the potential to pave the way for further development of
dialect-aware Arabic MLLMs.Summary
AI-Generated Summary