ChatPaper.aiChatPaper

探討僅使用解碼器的大型語言模型在語音轉文字翻譯中的應用

Investigating Decoder-only Large Language Models for Speech-to-text Translation

July 3, 2024
作者: Chao-Wei Huang, Hui Lu, Hongyu Gong, Hirofumi Inaguma, Ilia Kulikov, Ruslan Mavlyutov, Sravya Popuri
cs.AI

摘要

大型語言模型(LLMs)以其出色的推理能力、泛化能力和在不同領域中的流暢度而聞名,為增強與語音相關任務的潛在途徑。本文專注於將僅解碼器的LLMs整合到語音轉文字翻譯(S2TT)任務中。我們提出了一種僅解碼器架構,使LLM能夠直接處理編碼的語音表示並生成文本翻譯。此外,我們研究了不同參數高效微調技術和任務制定的影響。我們的模型在未使用專有數據訓練的模型中在CoVoST 2和FLEURS上實現了最先進的性能。我們還進行分析來驗證我們提出的模型設計選擇,並為LLMs整合到S2TT中帶來見解。
English
Large language models (LLMs), known for their exceptional reasoning capabilities, generalizability, and fluency across diverse domains, present a promising avenue for enhancing speech-related tasks. In this paper, we focus on integrating decoder-only LLMs to the task of speech-to-text translation (S2TT). We propose a decoder-only architecture that enables the LLM to directly consume the encoded speech representation and generate the text translation. Additionally, we investigate the effects of different parameter-efficient fine-tuning techniques and task formulation. Our model achieves state-of-the-art performance on CoVoST 2 and FLEURS among models trained without proprietary data. We also conduct analyses to validate the design choices of our proposed model and bring insights to the integration of LLMs to S2TT.

Summary

AI-Generated Summary

PDF111November 28, 2024