ChatPaper.aiChatPaper

SHAMI-MT:叙利亚阿拉伯方言与现代标准阿拉伯语双向机器翻译系统

SHAMI-MT: A Syrian Arabic Dialect to Modern Standard Arabic Bidirectional Machine Translation System

August 4, 2025
作者: Serry Sibaee, Omer Nacar, Yasser Al-Habashi, Adel Ammar, Wadii Boulila
cs.AI

摘要

阿拉伯世界丰富的语言景观以现代标准阿拉伯语(MSA)与日常生活中使用的多样化地区方言之间的显著鸿沟为特征。这种双语现象为自然语言处理,尤其是机器翻译,带来了巨大挑战。本文介绍了SHAMI-MT,一个专门设计用于弥合MSA与叙利亚方言之间沟通鸿沟的双向机器翻译系统。我们提出了两个专用模型,一个用于MSA到叙利亚方言的翻译,另一个则相反,两者均基于最先进的AraT5v2-base-1024架构构建。这些模型在全面的Nabra数据集上进行了微调,并在MADAR语料库的未见数据上进行了严格评估。我们的MSA到叙利亚方言模型在OPENAI的GPT-4.1模型评判下,获得了平均4.01分(满分5.0)的卓越质量评分,证明了其不仅能产出准确的翻译,还能保持方言的真实性。这项工作为先前服务不足的语言对提供了一个关键的高保真工具,推动了阿拉伯方言翻译领域的发展,并在内容本地化、文化遗产保护及跨文化交流中具有重要应用价值。
English
The rich linguistic landscape of the Arab world is characterized by a significant gap between Modern Standard Arabic (MSA), the language of formal communication, and the diverse regional dialects used in everyday life. This diglossia presents a formidable challenge for natural language processing, particularly machine translation. This paper introduces SHAMI-MT, a bidirectional machine translation system specifically engineered to bridge the communication gap between MSA and the Syrian dialect. We present two specialized models, one for MSA-to-Shami and another for Shami-to-MSA translation, both built upon the state-of-the-art AraT5v2-base-1024 architecture. The models were fine-tuned on the comprehensive Nabra dataset and rigorously evaluated on unseen data from the MADAR corpus. Our MSA-to-Shami model achieved an outstanding average quality score of 4.01 out of 5.0 when judged by OPENAI model GPT-4.1, demonstrating its ability to produce translations that are not only accurate but also dialectally authentic. This work provides a crucial, high-fidelity tool for a previously underserved language pair, advancing the field of dialectal Arabic translation and offering significant applications in content localization, cultural heritage, and intercultural communication.
PDF22August 5, 2025