大規模語言模型對長文本資料的重新評分

摘要

在這份研究中，我們探討大規模語言模型（LLM）對YouTube視頻的自動語音識別（ASR）的影響，我們將其作為長篇ASR的來源。我們展示了在美國英語（en-us）和混合印度英語（en-in）長篇ASR測試集中，相對詞錯率（WER）最多降低8％，以及在重要術語錯誤率（STER）上最多降低30％，相對於使用基於最大熵的語言模型的強大首過基線。改進的格子處理導致格子具有適當（非樹狀）的雙字拓撲結構，並從前一段的1最佳假設中攜帶上下文，這在使用LLM進行重評時帶來了顯著的優勢。我們還發現，LLM與訓練於大量可用數據（如C4）以及傳統神經語言模型的組合所帶來的性能提升是可加的，並且明顯優於使用最大熵LM的強大首過基線。

English

In this work, we study the impact of Large-scale Language Models (LLM) on Automated Speech Recognition (ASR) of YouTube videos, which we use as a source for long-form ASR. We demonstrate up to 8\% relative reduction in Word Error Eate (WER) on US English (en-us) and code-switched Indian English (en-in) long-form ASR test sets and a reduction of up to 30\% relative on Salient Term Error Rate (STER) over a strong first-pass baseline that uses a maximum-entropy based language model. Improved lattice processing that results in a lattice with a proper (non-tree) digraph topology and carrying context from the 1-best hypothesis of the previous segment(s) results in significant wins in rescoring with LLMs. We also find that the gains in performance from the combination of LLMs trained on vast quantities of available data (such as C4) and conventional neural LMs is additive and significantly outperforms a strong first-pass baseline with a maximum entropy LM.

大規模語言模型對長文本資料的重新評分

Large-scale Language Model Rescoring on Long-form Data

摘要

Support