利用语言类型学特征增强自然语言处理模型的跨语言行为测试

摘要

针对开发世界语言的自然语言处理系统的挑战之一是理解它们如何推广到与现实应用相关的类型学差异。为此，我们提出了M2C，这是一个考虑形态的框架，用于对自然语言处理模型进行行为测试。我们使用M2C 生成测试，探究模型在12种类型多样的语言中针对特定语言特征的行为。我们评估了最先进的语言模型在生成的测试中的表现。虽然模型在英语中大多数测试中表现出色，但我们强调了对特定类型学特征的推广失败，比如斯瓦希里语中的时间表达和芬兰语中的合成所有格。我们的发现促使开发能够解决这些盲点的模型。

English

A challenge towards developing NLP systems for the world's languages is understanding how they generalize to typological differences relevant for real-world applications. To this end, we propose M2C, a morphologically-aware framework for behavioral testing of NLP models. We use M2C to generate tests that probe models' behavior in light of specific linguistic features in 12 typologically diverse languages. We evaluate state-of-the-art language models on the generated tests. While models excel at most tests in English, we highlight generalization failures to specific typological characteristics such as temporal expressions in Swahili and compounding possessives in Finish. Our findings motivate the development of models that address these blind spots.

利用语言类型学特征增强自然语言处理模型的跨语言行为测试

Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features

摘要

Support