ChatPaper.aiChatPaper

同樣但不同:多語言語言建模中的結構相似性和差異。

The Same But Different: Structural Similarities and Differences in Multilingual Language Modeling

October 11, 2024
作者: Ruochen Zhang, Qinan Yu, Matianyu Zang, Carsten Eickhoff, Ellie Pavlick
cs.AI

摘要

我們採用機械解釋性的新工具,以探討大型語言模型(LLMs)的內部結構是否與其訓練語言的語言結構相對應。具體而言,我們探討以下問題:(1)當兩種語言採用相同的詞形句法過程時,LLMs 是否使用共享的內部電路來處理?以及(2)當兩種語言需要不同的詞形句法過程時,LLMs 是否使用不同的內部電路來處理?通過分析英文和中文的多語言和單語言模型,我們研究了兩個任務中涉及的內部電路。我們發現證據表明,模型使用相同的電路來處理相同的句法過程,而不受其發生語言的影響,即使是完全獨立訓練的單語言模型也是如此。此外,我們展示多語言模型在需要處理某些語言中才存在的語言過程(例如形態標記)時,會使用特定於語言的組件(注意力頭和前饋網路)。總的來說,我們的結果為我們了解LLMs在同時建模多種語言時如何在利用共同結構和保留語言差異之間取得平衡提供了新的見解。
English
We employ new tools from mechanistic interpretability in order to ask whether the internal structure of large language models (LLMs) shows correspondence to the linguistic structures which underlie the languages on which they are trained. In particular, we ask (1) when two languages employ the same morphosyntactic processes, do LLMs handle them using shared internal circuitry? and (2) when two languages require different morphosyntactic processes, do LLMs handle them using different internal circuitry? Using English and Chinese multilingual and monolingual models, we analyze the internal circuitry involved in two tasks. We find evidence that models employ the same circuit to handle the same syntactic process independently of the language in which it occurs, and that this is the case even for monolingual models trained completely independently. Moreover, we show that multilingual models employ language-specific components (attention heads and feed-forward networks) when needed to handle linguistic processes (e.g., morphological marking) that only exist in some languages. Together, our results provide new insights into how LLMs trade off between exploiting common structures and preserving linguistic differences when tasked with modeling multiple languages simultaneously.

Summary

AI-Generated Summary

PDF52November 16, 2024