ChatPaper.aiChatPaper

言語斯萊特林:檢視 Mamba 在言語分離、辨識和合成上的表現和效率

Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis

July 13, 2024
作者: Xilin Jiang, Yinghao Aaron Li, Adrian Nicolas Florea, Cong Han, Nima Mesgarani
cs.AI

摘要

在將 Mamba 與 transformers 在多項與語音相關任務的表現和效率進行比較之前,現在得出 Mamba 是比 transformers 更好的選擇仍為時過早。為了得出這個結論,我們提出並評估三個模型用於三個任務:Mamba-TasNet 用於語音分離,ConMamba 用於語音識別,以及 VALL-M 用於語音合成。我們將它們與相似大小的 transformers 在表現、記憶和速度上進行比較。我們的 Mamba 或 Mamba-transformer 混合模型展現出與其 transformer 對應模型(Sepformer、Conformer 和 VALL-E)相當或更高的表現:它們在記憶和速度上對於超過一個閾值持續時間的語音來說比 transformers 更有效率,這個閾值持續時間與語音標記的解析度成反比。Mamba 用於分離是最有效率的,而用於識別的效率最低。此外,我們展示了 Mamba 在語音持續時間短於閾值時並不比 transformer 更有效率,並在需要聯合建模文本和語音的模型中表現較差,例如兩個輸入的交叉或遮罩注意力。因此,我們認為 Mamba 或 transformer 的優越性取決於特定問題和模型。程式碼可在 https://github.com/xi-j/Mamba-TasNet 和 https://github.com/xi-j/Mamba-ASR 取得。
English
It is too early to conclude that Mamba is a better alternative to transformers for speech before comparing Mamba with transformers in terms of both performance and efficiency in multiple speech-related tasks. To reach this conclusion, we propose and evaluate three models for three tasks: Mamba-TasNet for speech separation, ConMamba for speech recognition, and VALL-M for speech synthesis. We compare them with transformers of similar sizes in performance, memory, and speed. Our Mamba or Mamba-transformer hybrid models show comparable or higher performance than their transformer counterparts: Sepformer, Conformer, and VALL-E. They are more efficient than transformers in memory and speed for speech longer than a threshold duration, inversely related to the resolution of a speech token. Mamba for separation is the most efficient, and Mamba for recognition is the least. Further, we show that Mamba is not more efficient than transformer for speech shorter than the threshold duration and performs worse in models that require joint modeling of text and speech, such as cross or masked attention of two inputs. Therefore, we argue that the superiority of Mamba or transformer depends on particular problems and models. Code available at https://github.com/xi-j/Mamba-TasNet and https://github.com/xi-j/Mamba-ASR.

Summary

AI-Generated Summary

PDF102November 28, 2024