WHISTRESS:透過句子重音檢測豐富轉錄內容
WHISTRESS: Enriching Transcriptions with Sentence Stress Detection
May 25, 2025
作者: Iddo Yosha, Dorin Shteyman, Yossi Adi
cs.AI
摘要
口語不僅通過詞語傳達意義,還依賴語調、情感及重音來表達。句子重音,即對句中特定詞語的強調,對於傳達說話者意圖至關重要,這一現象在語言學領域已得到廣泛研究。本研究提出了WHISTRESS,一種無需對齊的方法,旨在增強轉錄系統的句子重音檢測能力。為支持此任務,我們開發了TINYSTRESS-15K,這是一個可擴展的、用於句子重音檢測任務的合成訓練數據集,其生成過程完全自動化。我們在TINYSTRESS-15K上訓練WHISTRESS,並與多個競爭基準進行對比評估。結果表明,WHISTRESS在無需訓練或推理階段額外輸入先驗信息的情況下,性能超越現有方法。值得注意的是,儘管基於合成數據訓練,WHISTRESS在多樣化基準測試中展現出強大的零樣本泛化能力。項目頁面:https://pages.cs.huji.ac.il/adiyoss-lab/whistress。
English
Spoken language conveys meaning not only through words but also through
intonation, emotion, and emphasis. Sentence stress, the emphasis placed on
specific words within a sentence, is crucial for conveying speaker intent and
has been extensively studied in linguistics. In this work, we introduce
WHISTRESS, an alignment-free approach for enhancing transcription systems with
sentence stress detection. To support this task, we propose TINYSTRESS-15K, a
scalable, synthetic training data for the task of sentence stress detection
which resulted from a fully automated dataset creation process. We train
WHISTRESS on TINYSTRESS-15K and evaluate it against several competitive
baselines. Our results show that WHISTRESS outperforms existing methods while
requiring no additional input priors during training or inference. Notably,
despite being trained on synthetic data, WHISTRESS demonstrates strong
zero-shot generalization across diverse benchmarks. Project page:
https://pages.cs.huji.ac.il/adiyoss-lab/whistress.Summary
AI-Generated Summary