相同但不同:多语言语言建模中的结构相似性和差异
The Same But Different: Structural Similarities and Differences in Multilingual Language Modeling
October 11, 2024
作者: Ruochen Zhang, Qinan Yu, Matianyu Zang, Carsten Eickhoff, Ellie Pavlick
cs.AI
摘要
我们采用机械解释性的新工具,探讨大型语言模型(LLMs)的内部结构是否与其训练语言的语言结构相对应。具体而言,我们探讨:(1)当两种语言采用相同的形态句法过程时,LLMs是否使用共享的内部电路来处理它们?(2)当两种语言需要不同的形态句法过程时,LLMs是否使用不同的内部电路来处理它们?通过分析英语和中文的多语言和单语言模型,我们研究了涉及两个任务的内部电路。我们发现证据表明,模型使用相同的电路来处理相同的句法过程,而不受其发生语言的影响,即使是完全独立训练的单语模型也是如此。此外,我们展示了多语言模型在需要处理某些语言特有的语言过程(例如,形态标记)时会使用特定于语言的组件(注意力头和前馈网络)。综合而言,我们的结果为了解LLMs在同时建模多种语言时如何权衡利用共同结构和保留语言差异提供了新的见解。
English
We employ new tools from mechanistic interpretability in order to ask whether
the internal structure of large language models (LLMs) shows correspondence to
the linguistic structures which underlie the languages on which they are
trained. In particular, we ask (1) when two languages employ the same
morphosyntactic processes, do LLMs handle them using shared internal circuitry?
and (2) when two languages require different morphosyntactic processes, do LLMs
handle them using different internal circuitry? Using English and Chinese
multilingual and monolingual models, we analyze the internal circuitry involved
in two tasks. We find evidence that models employ the same circuit to handle
the same syntactic process independently of the language in which it occurs,
and that this is the case even for monolingual models trained completely
independently. Moreover, we show that multilingual models employ
language-specific components (attention heads and feed-forward networks) when
needed to handle linguistic processes (e.g., morphological marking) that only
exist in some languages. Together, our results provide new insights into how
LLMs trade off between exploiting common structures and preserving linguistic
differences when tasked with modeling multiple languages simultaneously.Summary
AI-Generated Summary