ChatPaper.aiChatPaper

StressTest:您的語音語言模型能否應對壓力?

StressTest: Can YOUR Speech LM Handle the Stress?

May 28, 2025
作者: Iddo Yosha, Gallil Maimon, Yossi Adi
cs.AI

摘要

句子重音是指在口語表達中對特定詞語進行強調,以突出或對比某個觀點,或引入新信息。它常用於暗示未明確表達的潛在意圖。近期,語音感知語言模型(SLMs)的進展使得直接處理音頻成為可能,讓模型能夠繞過轉錄步驟,充分利用語音信號的豐富信息,並執行諸如口語問答等音頻推理任務。儘管句子重音在塑造意義和說話者意圖方面起著關鍵作用,但在這類模型的評估和開發中卻大多被忽視。本研究中,我們通過引入StressTest來填補這一空白,這是一個專門設計的基準測試,用於評估模型基於重音模式區分口語句子解釋的能力。我們評估了多個領先SLMs的表現,發現儘管它們整體能力出色,但在這類任務上表現欠佳。為克服這一限制,我們提出了一種新穎的合成數據生成流程,並創建了Stress17k,這是一個模擬重音變化引發意義改變的訓練集。隨後,我們通過實驗證明,利用這一合成數據集優化模型能很好地與真實錄音對齊,並實現SLMs的有效微調。結果表明,我們微調後的模型StresSLM,在句子重音推理和檢測任務上均顯著優於現有模型。代碼、模型、數據及音頻樣本請訪問:pages.cs.huji.ac.il/adiyoss-lab/stresstest。
English
Sentence stress refers to emphasis, placed on specific words within a spoken utterance to highlight or contrast an idea, or to introduce new information. It is often used to imply an underlying intention that is not explicitly stated. Recent advances in speech-aware language models (SLMs) have enabled direct processing of audio, allowing models to bypass transcription and access the full richness of the speech signal and perform audio reasoning tasks such as spoken question answering. Despite the crucial role of sentence stress in shaping meaning and speaker intent, it remains largely overlooked in evaluation and development of such models. In this work, we address this gap by introducing StressTest, a benchmark specifically designed to evaluate a model's ability to distinguish between interpretations of spoken sentences based on the stress pattern. We assess the performance of several leading SLMs and find that, despite their overall capabilities, they perform poorly on such tasks. To overcome this limitation, we propose a novel synthetic data generation pipeline, and create Stress17k, a training set that simulates change of meaning implied by stress variation. Then, we empirically show that optimizing models with this synthetic dataset aligns well with real-world recordings and enables effective finetuning of SLMs. Results suggest, that our finetuned model, StresSLM, significantly outperforms existing models on both sentence stress reasoning and detection tasks. Code, models, data, and audio samples - pages.cs.huji.ac.il/adiyoss-lab/stresstest.
PDF172May 30, 2025