利用語言類型特徵強化自然語言處理模型的跨語言行為測試

摘要

發展世界各種語言的自然語言處理系統面臨的挑戰之一是了解它們如何應用於與現實應用相關的類型學差異。為此，我們提出了M2C，一個考慮詞形的框架，用於對自然語言處理模型進行行為測試。我們使用M2C 生成測試，以探究模型在12種類型多樣的語言中對特定語言特徵的行為。我們對最先進的語言模型在生成的測試上進行評估。儘管模型在英語的大多數測試中表現出色，我們強調了對於某些特定類型學特徵的泛化失敗，例如斯瓦希里語中的時間表達和芬蘭語中的合成所有格。我們的研究結果促使開發能夠解決這些盲點的模型。

English

A challenge towards developing NLP systems for the world's languages is understanding how they generalize to typological differences relevant for real-world applications. To this end, we propose M2C, a morphologically-aware framework for behavioral testing of NLP models. We use M2C to generate tests that probe models' behavior in light of specific linguistic features in 12 typologically diverse languages. We evaluate state-of-the-art language models on the generated tests. While models excel at most tests in English, we highlight generalization failures to specific typological characteristics such as temporal expressions in Swahili and compounding possessives in Finish. Our findings motivate the development of models that address these blind spots.

利用語言類型特徵強化自然語言處理模型的跨語言行為測試

Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features

摘要

Support