HiKE: 韓国語-英語コードスイッチング音声認識のための階層的評価フレームワーク

要旨

多言語自動音声認識（ASR）の進展にもかかわらず、日常会話で頻繁に見られる言語の混合であるコードスイッチング（CS）は、依然として十分に研究されていない課題です。本論文では、HiKE: Hierarchical Korean-English code-switching benchmarkを紹介します。これは、韓国語と英語のコードスイッチングを評価するための初のグローバルにアクセス可能な評価フレームワークであり、多言語ASRモデルの正確な評価手段を提供し、この分野の研究を促進することを目的としています。提案されたフレームワークは、様々なトピックにわたる高品質で自然なCSデータだけでなく、詳細な借用語ラベルと階層的なCSレベルラベリングスキーム（単語、フレーズ、文）を提供し、モデルが各レベルのコードスイッチングを処理する能力を体系的に評価することを可能にします。多様な多言語ASRモデルの評価と微調整実験を通じて、本論文は、ほとんどの多言語ASRモデルが最初はCS-ASRに苦戦するものの、CSデータを用いた微調整によってこの能力を有効にできることを示しています。HiKEはhttps://github.com/ThetaOne-AI/HiKEで利用可能です。

English

Despite advances in multilingual automatic speech recognition (ASR), code-switching (CS), the mixing of languages within an utterance common in daily speech, remains a severely underexplored challenge. In this paper, we introduce HiKE: the Hierarchical Korean-English code-switching benchmark, the first globally accessible evaluation framework for Korean-English CS, aiming to provide a means for the precise evaluation of multilingual ASR models and to foster research in the field. The proposed framework not only consists of high-quality, natural CS data across various topics, but also provides meticulous loanword labels and a hierarchical CS-level labeling scheme (word, phrase, and sentence) that together enable a systematic evaluation of a model's ability to handle each distinct level of code-switching. Through evaluations of diverse multilingual ASR models and fine-tuning experiments, this paper demonstrates that while most multilingual ASR models initially struggle with CS-ASR, this capability can be enabled through fine-tuning with CS data. HiKE will be available at https://github.com/ThetaOne-AI/HiKE.

HiKE: 韓国語-英語コードスイッチング音声認識のための階層的評価フレームワーク

HiKE: Hierarchical Evaluation Framework for Korean-English Code-Switching Speech Recognition

要旨

Support