WHISTRESS: 文ストレス検出による文字起こしの強化

要旨

音声言語は、単語だけでなく、イントネーション、感情、強調を通じて意味を伝えます。文中の特定の単語に置かれる強調、すなわち文ストレスは、話者の意図を伝える上で極めて重要であり、言語学において広く研究されてきました。本研究では、文ストレス検出を備えた書き起こしシステムを強化するための、アラインメント不要のアプローチであるWHISTRESSを紹介します。このタスクを支援するため、完全に自動化されたデータセット作成プロセスによって生成された、文ストレス検出のためのスケーラブルな合成トレーニングデータであるTINYSTRESS-15Kを提案します。WHISTRESSをTINYSTRESS-15Kでトレーニングし、いくつかの競合するベースラインと比較評価します。その結果、WHISTRESSは既存の手法を上回りながら、トレーニングや推論時に追加の入力事前情報を必要としないことが示されました。特に、合成データでトレーニングされているにもかかわらず、WHISTRESSは多様なベンチマークにおいて強力なゼロショット汎化能力を示しています。プロジェクトページ: https://pages.cs.huji.ac.il/adiyoss-lab/whistress。

English

Spoken language conveys meaning not only through words but also through intonation, emotion, and emphasis. Sentence stress, the emphasis placed on specific words within a sentence, is crucial for conveying speaker intent and has been extensively studied in linguistics. In this work, we introduce WHISTRESS, an alignment-free approach for enhancing transcription systems with sentence stress detection. To support this task, we propose TINYSTRESS-15K, a scalable, synthetic training data for the task of sentence stress detection which resulted from a fully automated dataset creation process. We train WHISTRESS on TINYSTRESS-15K and evaluate it against several competitive baselines. Our results show that WHISTRESS outperforms existing methods while requiring no additional input priors during training or inference. Notably, despite being trained on synthetic data, WHISTRESS demonstrates strong zero-shot generalization across diverse benchmarks. Project page: https://pages.cs.huji.ac.il/adiyoss-lab/whistress.

WHISTRESS: 文ストレス検出による文字起こしの強化

WHISTRESS: Enriching Transcriptions with Sentence Stress Detection

要旨

Support