ChatPaper.aiChatPaper

OpenAI Whisper模型的量化:一項比較分析

Quantization for OpenAI's Whisper Models: A Comparative Analysis

March 12, 2025
作者: Allison Andreyev
cs.AI

摘要

自動語音辨識(ASR)模型在字幕生成、語音翻譯及即時轉錄等應用中日益受到重視。本文探討了Whisper及其兩種模型變體:一種針對即時語音串流進行優化,另一種則專注於離線轉錄。值得注意的是,這些模型被發現會產生虛構內容,降低了轉錄的可靠性。此外,較大的模型變體顯示出更高的延遲,並對資源受限設備的部署提出了挑戰。本研究分析了三種Whisper模型之間的相似性與差異,定性探討了它們各自的能力。接著,本研究量化了模型量化對延遲的影響,並評估了其在邊緣設備部署中的可行性。利用開源的LibriSpeech數據集,本文評估了使用三種量化方法(INT4、INT5、INT8)的whispercpp的詞錯誤率(WER)及延遲分析。結果顯示,量化使延遲降低了19%,模型大小減少了45%,同時保持了轉錄的準確性。這些發現為不同Whisper模型的最佳使用場景及邊緣設備部署的可能性提供了洞見。所有程式碼、數據集及實作細節均公開於GitHub倉庫:https://github.com/allisonandreyev/WhisperQuantization.git。
English
Automated speech recognition (ASR) models have gained prominence for applications such as captioning, speech translation, and live transcription. This paper studies Whisper and two model variants: one optimized for live speech streaming and another for offline transcription. Notably, these models have been found to generate hallucinated content, reducing transcription reliability. Furthermore, larger model variants exhibit increased latency and pose challenges for deployment on resource-constrained devices. This study analyzes the similarities and differences between three Whisper models, qualitatively examining their distinct capabilities. Next, this study quantifies the impact of model quantization on latency and evaluates its viability for edge deployment. Using the open source LibriSpeech dataset, this paper evaluates the word error rate (WER) along with latency analysis of whispercpp using 3 quantization methods (INT4, INT5, INT8). Results show that quantization reduces latency by 19\% and model size by 45\%, while preserving transcription accuracy. These findings provide insights into the optimal use cases of different Whisper models and edge device deployment possibilities. All code, datasets, and implementation details are available in a public GitHub repository: https://github.com/allisonandreyev/WhisperQuantization.git

Summary

AI-Generated Summary

PDF62March 14, 2025