WHISTRESS: 문장 강세 탐지를 통한 전사 자료의 풍부화

초록

구어는 단어뿐만 아니라 억양, 감정, 강세를 통해 의미를 전달한다. 문장 강세, 즉 문장 내 특정 단어에 부여되는 강조는 화자의 의도를 전달하는 데 핵심적인 역할을 하며, 언어학에서 광범위하게 연구되어 왔다. 본 연구에서는 문장 강세 탐지를 통해 전사 시스템을 개선하기 위한 정렬 불필요 접근법인 WHISTRESS를 소개한다. 이를 지원하기 위해, 완전히 자동화된 데이터셋 생성 과정을 통해 얻은 확장 가능한 합성 훈련 데이터인 TINYSTRESS-15K를 제안한다. 우리는 TINYSTRESS-15K를 사용하여 WHISTRESS를 훈련시키고, 여러 경쟁적인 기준 모델과 비교 평가한다. 실험 결과, WHISTRESS는 훈련이나 추론 과정에서 추가적인 입력 사전 정보 없이도 기존 방법들을 능가하는 성능을 보인다. 특히, 합성 데이터로 훈련되었음에도 불구하고, WHISTRESS는 다양한 벤치마크에서 강력한 제로샷 일반화 능력을 입증한다. 프로젝트 페이지: https://pages.cs.huji.ac.il/adiyoss-lab/whistress.

English

Spoken language conveys meaning not only through words but also through intonation, emotion, and emphasis. Sentence stress, the emphasis placed on specific words within a sentence, is crucial for conveying speaker intent and has been extensively studied in linguistics. In this work, we introduce WHISTRESS, an alignment-free approach for enhancing transcription systems with sentence stress detection. To support this task, we propose TINYSTRESS-15K, a scalable, synthetic training data for the task of sentence stress detection which resulted from a fully automated dataset creation process. We train WHISTRESS on TINYSTRESS-15K and evaluate it against several competitive baselines. Our results show that WHISTRESS outperforms existing methods while requiring no additional input priors during training or inference. Notably, despite being trained on synthetic data, WHISTRESS demonstrates strong zero-shot generalization across diverse benchmarks. Project page: https://pages.cs.huji.ac.il/adiyoss-lab/whistress.

WHISTRESS: 문장 강세 탐지를 통한 전사 자료의 풍부화

WHISTRESS: Enriching Transcriptions with Sentence Stress Detection

초록

Support