ArzEn-LLM:使用LLM進行混合碼轉換埃及阿拉伯語-英語翻譯和語音識別

ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs

June 26, 2024
作者: Ahmed Heakl, Youssef Zaghloul, Mennatullah Ali, Rania Hossam, Walid Gomaa
cs.AI

摘要

受近年來埃及阿拉伯語和英語之間代碼切換現象普遍增加的影響,本文探討機器翻譯(MT)和自動語音識別(ASR)系統的複雜性,專注於將代碼切換的埃及阿拉伯語-英語翻譯成英語或埃及阿拉伯語。我們的目標是介紹開發這些系統所採用的方法,利用大型語言模型如LLama和Gemma。在ASR領域,我們探討了Whisper模型在代碼切換的埃及阿拉伯語識別中的應用,詳細說明了我們的實驗程序,包括數據預處理和訓練技術。通過實施一個將ASR與MT集成的連續語音轉文本翻譯系統,我們旨在克服受限資源和埃及阿拉伯方言的獨特特徵所帶來的挑戰。根據已建立的指標進行評估,我們的方法較最先進技術在英語翻譯方面實現了顯著的56%改進,阿拉伯語翻譯方面則為9.3%。由於代碼切換在口語語言中根深蒂固,ASR系統能夠有效處理這一現象至關重要。這種能力對於實現在各個領域中的無縫互動至關重要,包括商務談判、文化交流和學術論述。我們的模型和代碼可作為開源資源使用。代碼:http://github.com/ahmedheakl/arazn-llm,模型:http://huggingface.co/collections/ahmedheakl/arazn-llm-662ceaf12777656607b9524e。
English
Motivated by the widespread increase in the phenomenon of code-switching between Egyptian Arabic and English in recent times, this paper explores the intricacies of machine translation (MT) and automatic speech recognition (ASR) systems, focusing on translating code-switched Egyptian Arabic-English to either English or Egyptian Arabic. Our goal is to present the methodologies employed in developing these systems, utilizing large language models such as LLama and Gemma. In the field of ASR, we explore the utilization of the Whisper model for code-switched Egyptian Arabic recognition, detailing our experimental procedures including data preprocessing and training techniques. Through the implementation of a consecutive speech-to-text translation system that integrates ASR with MT, we aim to overcome challenges posed by limited resources and the unique characteristics of the Egyptian Arabic dialect. Evaluation against established metrics showcases promising results, with our methodologies yielding a significant improvement of 56% in English translation over the state-of-the-art and 9.3% in Arabic translation. Since code-switching is deeply inherent in spoken languages, it is crucial that ASR systems can effectively handle this phenomenon. This capability is crucial for enabling seamless interaction in various domains, including business negotiations, cultural exchanges, and academic discourse. Our models and code are available as open-source resources. Code: http://github.com/ahmedheakl/arazn-llm}, Models: http://huggingface.co/collections/ahmedheakl/arazn-llm-662ceaf12777656607b9524e.

Summary

AI-Generated Summary

PDF55November 29, 2024