ChatPaper.aiChatPaper

研究仅解码器的大型语言模型用于语音转文本翻译

Investigating Decoder-only Large Language Models for Speech-to-text Translation

July 3, 2024
作者: Chao-Wei Huang, Hui Lu, Hongyu Gong, Hirofumi Inaguma, Ilia Kulikov, Ruslan Mavlyutov, Sravya Popuri
cs.AI

摘要

大型语言模型(LLMs)以其出色的推理能力、泛化能力和在不同领域中的流畅性而闻名,为增强与语音相关的任务提供了一个有前途的途径。本文着重于将仅解码器的LLMs集成到语音转文本翻译(S2TT)任务中。我们提出了一个仅解码器架构,使LLM能够直接消化编码的语音表示并生成文本翻译。此外,我们研究了不同参数高效微调技术和任务制定的影响。我们的模型在未使用专有数据训练的模型中在CoVoST 2和FLEURS上实现了最先进的性能。我们还进行了分析,以验证我们提出的模型设计选择,并为将LLMs集成到S2TT中带来见解。
English
Large language models (LLMs), known for their exceptional reasoning capabilities, generalizability, and fluency across diverse domains, present a promising avenue for enhancing speech-related tasks. In this paper, we focus on integrating decoder-only LLMs to the task of speech-to-text translation (S2TT). We propose a decoder-only architecture that enables the LLM to directly consume the encoded speech representation and generate the text translation. Additionally, we investigate the effects of different parameter-efficient fine-tuning techniques and task formulation. Our model achieves state-of-the-art performance on CoVoST 2 and FLEURS among models trained without proprietary data. We also conduct analyses to validate the design choices of our proposed model and bring insights to the integration of LLMs to S2TT.

Summary

AI-Generated Summary

PDF111November 28, 2024