ChatPaper.aiChatPaper

WHISTRESS:通过句子重音检测增强转录质量

WHISTRESS: Enriching Transcriptions with Sentence Stress Detection

May 25, 2025
作者: Iddo Yosha, Dorin Shteyman, Yossi Adi
cs.AI

摘要

口语不仅通过词汇传达意义,还借助语调、情感和重音来表达。句子重音,即对句中特定词汇的强调,对于传达说话者意图至关重要,这一现象在语言学领域已得到广泛研究。本文中,我们介绍了WHISTRESS,一种无需对齐的方法,用于增强转录系统的句子重音检测能力。为支持这一任务,我们提出了TINYSTRESS-15K,这是一个可扩展的、为句子重音检测任务设计的合成训练数据集,其生成过程完全自动化。我们在TINYSTRESS-15K上训练WHISTRESS,并与多个竞争基线模型进行对比评估。结果显示,WHISTRESS在训练或推理过程中无需额外输入先验信息的情况下,性能优于现有方法。尤为值得注意的是,尽管基于合成数据训练,WHISTRESS在多样化的基准测试中展现了强大的零样本泛化能力。项目页面:https://pages.cs.huji.ac.il/adiyoss-lab/whistress。
English
Spoken language conveys meaning not only through words but also through intonation, emotion, and emphasis. Sentence stress, the emphasis placed on specific words within a sentence, is crucial for conveying speaker intent and has been extensively studied in linguistics. In this work, we introduce WHISTRESS, an alignment-free approach for enhancing transcription systems with sentence stress detection. To support this task, we propose TINYSTRESS-15K, a scalable, synthetic training data for the task of sentence stress detection which resulted from a fully automated dataset creation process. We train WHISTRESS on TINYSTRESS-15K and evaluate it against several competitive baselines. Our results show that WHISTRESS outperforms existing methods while requiring no additional input priors during training or inference. Notably, despite being trained on synthetic data, WHISTRESS demonstrates strong zero-shot generalization across diverse benchmarks. Project page: https://pages.cs.huji.ac.il/adiyoss-lab/whistress.

Summary

AI-Generated Summary

PDF102May 27, 2025