ArzEn-LLM:使用LLM进行混合编码的埃及阿拉伯语-英语翻译和语音识别

ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs

June 26, 2024
作者: Ahmed Heakl, Youssef Zaghloul, Mennatullah Ali, Rania Hossam, Walid Gomaa
cs.AI

摘要

受近年来埃及阿拉伯语和英语混合代码转换现象普遍增加的启发,本文探讨了机器翻译(MT)和自动语音识别(ASR)系统的复杂性,重点研究将混合代码转换的埃及阿拉伯语-英语翻译为英语或埃及阿拉伯语。我们的目标是介绍开发这些系统所采用的方法,利用大型语言模型如LLama和Gemma。在ASR领域,我们探讨了Whisper模型在混合代码转换的埃及阿拉伯语识别中的应用,详细说明了我们的实验过程,包括数据预处理和训练技术。通过实施结合ASR和MT的连续语音转文本翻译系统,我们旨在克服受限资源和埃及阿拉伯方言的独特特征所带来的挑战。根据已建立的度量标准进行评估显示出令人期待的结果,我们的方法在英语翻译方面取得了56%的显著改进,而在阿拉伯语翻译方面则有9.3%的提升。由于代码转换在口语中根深蒂固,ASR系统能够有效处理这一现象至关重要。这种能力对于在各个领域实现无缝互动至关重要,包括商务谈判、文化交流和学术论述。我们的模型和代码可作为开源资源使用。代码:http://github.com/ahmedheakl/arazn-llm,模型:http://huggingface.co/collections/ahmedheakl/arazn-llm-662ceaf12777656607b9524e。
English
Motivated by the widespread increase in the phenomenon of code-switching between Egyptian Arabic and English in recent times, this paper explores the intricacies of machine translation (MT) and automatic speech recognition (ASR) systems, focusing on translating code-switched Egyptian Arabic-English to either English or Egyptian Arabic. Our goal is to present the methodologies employed in developing these systems, utilizing large language models such as LLama and Gemma. In the field of ASR, we explore the utilization of the Whisper model for code-switched Egyptian Arabic recognition, detailing our experimental procedures including data preprocessing and training techniques. Through the implementation of a consecutive speech-to-text translation system that integrates ASR with MT, we aim to overcome challenges posed by limited resources and the unique characteristics of the Egyptian Arabic dialect. Evaluation against established metrics showcases promising results, with our methodologies yielding a significant improvement of 56% in English translation over the state-of-the-art and 9.3% in Arabic translation. Since code-switching is deeply inherent in spoken languages, it is crucial that ASR systems can effectively handle this phenomenon. This capability is crucial for enabling seamless interaction in various domains, including business negotiations, cultural exchanges, and academic discourse. Our models and code are available as open-source resources. Code: http://github.com/ahmedheakl/arazn-llm}, Models: http://huggingface.co/collections/ahmedheakl/arazn-llm-662ceaf12777656607b9524e.

Summary

AI-Generated Summary

PDF55November 29, 2024