유형론적 특징을 활용한 NLP 모델의 크로스링구얼 행동 테스트 강화

초록

전 세계 언어를 위한 NLP 시스템 개발에 있어 한 가지 과제는 이러한 시스템이 실제 응용 프로그램과 관련된 유형론적 차이에 대해 어떻게 일반화되는지 이해하는 것입니다. 이를 위해 우리는 형태론을 고려한 NLP 모델 행동 테스트 프레임워크인 M2C를 제안합니다. M2C를 사용하여 12개의 다양한 유형론적 언어에서 특정 언어적 특징을 기반으로 모델의 행동을 탐구하는 테스트를 생성합니다. 우리는 생성된 테스트에 대해 최첨단 언어 모델을 평가합니다. 영어에서는 대부분의 테스트에서 모델이 뛰어난 성능을 보이지만, 스와힐리어의 시간 표현이나 핀란드어의 소유격 합성어와 같은 특정 유형론적 특성에 대한 일반화 실패를 강조합니다. 이러한 연구 결과는 이러한 약점을 해결할 수 있는 모델 개발의 필요성을 촉구합니다.

English

A challenge towards developing NLP systems for the world's languages is understanding how they generalize to typological differences relevant for real-world applications. To this end, we propose M2C, a morphologically-aware framework for behavioral testing of NLP models. We use M2C to generate tests that probe models' behavior in light of specific linguistic features in 12 typologically diverse languages. We evaluate state-of-the-art language models on the generated tests. While models excel at most tests in English, we highlight generalization failures to specific typological characteristics such as temporal expressions in Swahili and compounding possessives in Finish. Our findings motivate the development of models that address these blind spots.

유형론적 특징을 활용한 NLP 모델의 크로스링구얼 행동 테스트 강화

Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features

초록

Support