ChatPaper.aiChatPaper

LiteASR:基於低秩逼近的高效自動語音辨識

LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation

February 27, 2025
作者: Keisuke Kamahori, Jungo Kasai, Noriyuki Kojima, Baris Kasikci
cs.AI

摘要

現代自動語音辨識(ASR)模型,如OpenAI的Whisper,依賴於深度編碼器-解碼器架構,其編碼器由於計算密集度高,成為高效部署的關鍵瓶頸。我們提出了LiteASR,一種針對ASR編碼器的低秩壓縮方案,能在保持轉錄準確性的同時,顯著降低推理成本。我們的方法利用了在中間激活中觀察到的強低秩特性:通過使用少量校準數據進行主成分分析(PCA),我們用一系列低秩矩陣乘法來近似線性變換,並進一步優化自注意力機制,使其在降維空間中運作。評估結果顯示,我們的方法能將Whisper large-v3的編碼器大小壓縮超過50%,使其尺寸與Whisper medium相當,但轉錄準確性更佳,從而建立了效率與性能的新帕累托最優前沿。LiteASR的代碼可在https://github.com/efeslab/LiteASR獲取。
English
Modern automatic speech recognition (ASR) models, such as OpenAI's Whisper, rely on deep encoder-decoder architectures, and their encoders are a critical bottleneck for efficient deployment due to high computational intensity. We introduce LiteASR, a low-rank compression scheme for ASR encoders that significantly reduces inference costs while maintaining transcription accuracy. Our approach leverages the strong low-rank properties observed in intermediate activations: by applying principal component analysis (PCA) with a small calibration dataset, we approximate linear transformations with a chain of low-rank matrix multiplications, and further optimize self-attention to work in the reduced dimension. Evaluation results show that our method can compress Whisper large-v3's encoder size by over 50%, matching Whisper medium's size with better transcription accuracy, thereby establishing a new Pareto-optimal frontier of efficiency and performance. The code of LiteASR is available at https://github.com/efeslab/LiteASR.

Summary

AI-Generated Summary

PDF132March 3, 2025