ChatPaper.aiChatPaper

Distil-Whisper:通过大规模伪标记实现稳健知识蒸馏

Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling

November 1, 2023
作者: Sanchit Gandhi, Patrick von Platen, Alexander M. Rush
cs.AI

摘要

随着预训练语音识别模型的规模增大,在低延迟或资源受限的环境中运行这些大型模型变得具有挑战性。在这项工作中,我们利用伪标记来构建一个大规模开源数据集,用于将Whisper模型提炼成一个更小的变体,称为Distil-Whisper。通过使用简单的词错误率(WER)启发式方法,我们仅选择最高质量的伪标签进行训练。提炼后的模型速度提升了5.8倍,参数数量减少了51%,同时在零次转移设置中,在分布外测试数据上的WER仅相差1%。Distil-Whisper保持了Whisper模型对复杂声学条件的稳健性,同时在长形音频上减少了幻觉错误的倾向。Distil-Whisper旨在与Whisper配对进行推测解码,从而实现2倍速度提升,同时在数学上确保与原始模型相同的输出。为了促进该领域的进一步研究,我们公开了我们的训练代码、推理代码和模型。
English
As the size of pre-trained speech recognition models increases, running these large models in low-latency or resource-constrained environments becomes challenging. In this work, we leverage pseudo-labelling to assemble a large-scale open-source dataset which we use to distill the Whisper model into a smaller variant, called Distil-Whisper. Using a simple word error rate (WER) heuristic, we select only the highest quality pseudo-labels for training. The distilled model is 5.8 times faster with 51% fewer parameters, while performing to within 1% WER on out-of-distribution test data in a zero-shot transfer setting. Distil-Whisper maintains the robustness of the Whisper model to difficult acoustic conditions, while being less prone to hallucination errors on long-form audio. Distil-Whisper is designed to be paired with Whisper for speculative decoding, yielding a 2 times speed-up while mathematically ensuring the same outputs as the original model. To facilitate further research in this domain, we make our training code, inference code and models publicly accessible.
PDF582December 15, 2024