言語類型論的特徴を用いたNLPモデルの多言語行動テストの強化

要旨

世界中の言語向けNLPシステムを開発する上での課題は、実世界のアプリケーションに関連する類型論的差異に対してそれらがどのように一般化するかを理解することです。この目的のために、我々は形態論を考慮したNLPモデルの行動テストフレームワークであるM2Cを提案します。M2Cを使用して、12の類型論的に多様な言語における特定の言語的特徴に基づいてモデルの行動を探るテストを生成します。生成されたテストに対して最先端の言語モデルを評価します。英語のほとんどのテストではモデルが優れた性能を示す一方で、スワヒリ語の時間表現やフィンランド語の複合所有表現といった特定の類型論的特徴に対する一般化の失敗が明らかになりました。これらの知見は、こうした盲点に対処するモデルの開発を促すものです。

English

A challenge towards developing NLP systems for the world's languages is understanding how they generalize to typological differences relevant for real-world applications. To this end, we propose M2C, a morphologically-aware framework for behavioral testing of NLP models. We use M2C to generate tests that probe models' behavior in light of specific linguistic features in 12 typologically diverse languages. We evaluate state-of-the-art language models on the generated tests. While models excel at most tests in English, we highlight generalization failures to specific typological characteristics such as temporal expressions in Swahili and compounding possessives in Finish. Our findings motivate the development of models that address these blind spots.

言語類型論的特徴を用いたNLPモデルの多言語行動テストの強化

Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features

要旨

Support