HiKE: 한국어-영어 코드 스위칭 음성 인식을 위한 계층적 평가 프레임워크

초록

다국어 자동 음성 인식(ASR)의 발전에도 불구하고, 일상 대화에서 흔히 관찰되는 언어 간 혼용(code-switching, CS)은 여전히 심각하게 연구가 부족한 과제로 남아 있습니다. 본 논문에서는 한국어-영어 코드 스위칭을 위한 첫 번째 글로벌 평가 프레임워크인 HiKE(Hierarchical Korean-English code-switching benchmark)를 소개합니다. HiKE는 다국어 ASR 모델의 정밀한 평가 수단을 제공하고 해당 분야의 연구를 촉진하기 위해 설계되었습니다. 제안된 프레임워크는 다양한 주제에 걸친 고품질의 자연스러운 CS 데이터를 포함할 뿐만 아니라, 세심한 외래어 레이블과 계층적 CS 수준(단어, 구, 문장) 레이블링 체계를 제공하여 모델이 각각의 코드 스위칭 수준을 처리하는 능력을 체계적으로 평가할 수 있도록 합니다. 다양한 다국어 ASR 모델의 평가와 미세 조정(fine-tuning) 실험을 통해, 대부분의 다국어 ASR 모델이 초기에는 CS-ASR에 어려움을 겪지만, CS 데이터를 사용한 미세 조정을 통해 이 능력을 활성화할 수 있음을 입증합니다. HiKE는 https://github.com/ThetaOne-AI/HiKE에서 이용 가능할 예정입니다.

English

Despite advances in multilingual automatic speech recognition (ASR), code-switching (CS), the mixing of languages within an utterance common in daily speech, remains a severely underexplored challenge. In this paper, we introduce HiKE: the Hierarchical Korean-English code-switching benchmark, the first globally accessible evaluation framework for Korean-English CS, aiming to provide a means for the precise evaluation of multilingual ASR models and to foster research in the field. The proposed framework not only consists of high-quality, natural CS data across various topics, but also provides meticulous loanword labels and a hierarchical CS-level labeling scheme (word, phrase, and sentence) that together enable a systematic evaluation of a model's ability to handle each distinct level of code-switching. Through evaluations of diverse multilingual ASR models and fine-tuning experiments, this paper demonstrates that while most multilingual ASR models initially struggle with CS-ASR, this capability can be enabled through fine-tuning with CS data. HiKE will be available at https://github.com/ThetaOne-AI/HiKE.

HiKE: 한국어-영어 코드 스위칭 음성 인식을 위한 계층적 평가 프레임워크

HiKE: Hierarchical Evaluation Framework for Korean-English Code-Switching Speech Recognition

초록

Support