压力测试:您的语音语言模型能否应对压力?
StressTest: Can YOUR Speech LM Handle the Stress?
May 28, 2025
作者: Iddo Yosha, Gallil Maimon, Yossi Adi
cs.AI
摘要
句子重音是指在口语表达中对特定词汇加以强调,以突出或对比某个观点,或引入新信息。它常用于暗示未明言的潜在意图。近年来,随着语音感知语言模型(SLMs)的进步,模型能够直接处理音频数据,绕开文字转录,充分利用语音信号的丰富信息,执行如口语问答等音频推理任务。尽管句子重音在塑造语义和说话者意图方面起着关键作用,但在这类模型的评估与开发中却常被忽视。本研究通过引入StressTest基准测试,专门评估模型基于重音模式区分口语句子解释的能力,填补了这一空白。我们对多个领先的SLMs进行了性能评估,发现尽管它们整体能力强大,但在此类任务上表现欠佳。为克服这一局限,我们提出了一种新颖的合成数据生成流程,并创建了Stress17k训练集,该数据集模拟了重音变化引发的意义转变。随后,我们通过实验证明,利用这一合成数据集优化模型,能很好地与现实世界的录音对齐,并有效实现SLMs的微调。结果表明,经过我们微调的模型StresSLM,在句子重音推理与检测任务上均显著优于现有模型。代码、模型、数据及音频样本详见:pages.cs.huji.ac.il/adiyoss-lab/stresstest。
English
Sentence stress refers to emphasis, placed on specific words within a spoken
utterance to highlight or contrast an idea, or to introduce new information. It
is often used to imply an underlying intention that is not explicitly stated.
Recent advances in speech-aware language models (SLMs) have enabled direct
processing of audio, allowing models to bypass transcription and access the
full richness of the speech signal and perform audio reasoning tasks such as
spoken question answering. Despite the crucial role of sentence stress in
shaping meaning and speaker intent, it remains largely overlooked in evaluation
and development of such models. In this work, we address this gap by
introducing StressTest, a benchmark specifically designed to evaluate a model's
ability to distinguish between interpretations of spoken sentences based on the
stress pattern. We assess the performance of several leading SLMs and find
that, despite their overall capabilities, they perform poorly on such tasks. To
overcome this limitation, we propose a novel synthetic data generation
pipeline, and create Stress17k, a training set that simulates change of meaning
implied by stress variation. Then, we empirically show that optimizing models
with this synthetic dataset aligns well with real-world recordings and enables
effective finetuning of SLMs. Results suggest, that our finetuned model,
StresSLM, significantly outperforms existing models on both sentence stress
reasoning and detection tasks. Code, models, data, and audio samples -
pages.cs.huji.ac.il/adiyoss-lab/stresstest.Summary
AI-Generated Summary