OpenAIのWhisperモデルにおける量子化：比較分析

要旨

自動音声認識（ASR）モデルは、キャプション生成、音声翻訳、ライブ文字起こしなどのアプリケーションで注目を集めています。本論文では、Whisperとその2つのモデル変種（ライブ音声ストリーミング用に最適化されたものとオフライン文字起こし用のもの）を研究します。特に、これらのモデルは幻覚的な内容を生成することが判明しており、文字起こしの信頼性を低下させています。さらに、大規模なモデル変種はレイテンシが増加し、リソースが制限されたデバイスへの展開に課題を抱えています。本研究では、3つのWhisperモデルの類似点と相違点を分析し、それぞれの能力を定性的に検証します。次に、モデルの量子化がレイテンシに与える影響を定量化し、エッジデバイスへの展開の実現可能性を評価します。オープンソースのLibriSpeechデータセットを使用し、3つの量子化手法（INT4、INT5、INT8）を用いたwhispercppの単語誤り率（WER）とレイテンシ分析を評価します。結果は、量子化によりレイテンシが19％減少し、モデルサイズが45％削減される一方で、文字起こしの精度が維持されることを示しています。これらの知見は、異なるWhisperモデルの最適な使用例とエッジデバイスへの展開可能性に関する洞察を提供します。すべてのコード、データセット、および実装の詳細は、公開GitHubリポジトリで利用可能です: https://github.com/allisonandreyev/WhisperQuantization.git

English

Automated speech recognition (ASR) models have gained prominence for applications such as captioning, speech translation, and live transcription. This paper studies Whisper and two model variants: one optimized for live speech streaming and another for offline transcription. Notably, these models have been found to generate hallucinated content, reducing transcription reliability. Furthermore, larger model variants exhibit increased latency and pose challenges for deployment on resource-constrained devices. This study analyzes the similarities and differences between three Whisper models, qualitatively examining their distinct capabilities. Next, this study quantifies the impact of model quantization on latency and evaluates its viability for edge deployment. Using the open source LibriSpeech dataset, this paper evaluates the word error rate (WER) along with latency analysis of whispercpp using 3 quantization methods (INT4, INT5, INT8). Results show that quantization reduces latency by 19\% and model size by 45\%, while preserving transcription accuracy. These findings provide insights into the optimal use cases of different Whisper models and edge device deployment possibilities. All code, datasets, and implementation details are available in a public GitHub repository: https://github.com/allisonandreyev/WhisperQuantization.git

OpenAIのWhisperモデルにおける量子化：比較分析

Quantization for OpenAI's Whisper Models: A Comparative Analysis

要旨

Support