大规模语言模型对长篇数据的再评分

摘要

在这项工作中，我们研究了大规模语言模型（LLM）对YouTube视频的自动语音识别（ASR）的影响，我们将其用作长篇ASR的数据源。我们在美式英语（en-us）和印度英语（en-in）混合语境的长篇ASR测试集上展示了高达8\%的相对词错误率（WER）降低，以及在显著术语错误率（STER）上高达30\%的相对降低，相较于使用基于最大熵的语言模型的强大一次通过基线。改进的格处理导致了一个具有适当（非树形）双图拓扑结构并携带前一段的1最佳假设的上下文的格，这在LLM的重新评分中取得了显著的胜利。我们还发现，LLM与在大量可用数据（如C4）上训练的传统神经语言模型的组合带来的性能提升是累加的，并且明显优于使用最大熵LM的强大一次通过基线。

English

In this work, we study the impact of Large-scale Language Models (LLM) on Automated Speech Recognition (ASR) of YouTube videos, which we use as a source for long-form ASR. We demonstrate up to 8\% relative reduction in Word Error Eate (WER) on US English (en-us) and code-switched Indian English (en-in) long-form ASR test sets and a reduction of up to 30\% relative on Salient Term Error Rate (STER) over a strong first-pass baseline that uses a maximum-entropy based language model. Improved lattice processing that results in a lattice with a proper (non-tree) digraph topology and carrying context from the 1-best hypothesis of the previous segment(s) results in significant wins in rescoring with LLMs. We also find that the gains in performance from the combination of LLMs trained on vast quantities of available data (such as C4) and conventional neural LMs is additive and significantly outperforms a strong first-pass baseline with a maximum entropy LM.

大规模语言模型对长篇数据的再评分

Large-scale Language Model Rescoring on Long-form Data

摘要

Support