LMの合成的汎化と幻覚における線形相関

要旨

言語モデル（LMs）の一般化に関する議論が活発化しており、それらの一般知能への潜在的な可能性と基本的な知識構成（例：逆/遷移の呪い）との戦いが対照されています。本論文では、知識構成中のLMsにおける線形相関の現象を明らかにします。説明すると、特定の関連する知識間には、次のトークン予測のロジットをマッピングする特定の線形変換が存在し、例えば、与えられたXに対して、「X lives in the city of」⇒「X lives in the country of」となります。これは、パリ⇒フランスなどの人間の知識構成における線形性を反映しています。私たちの調査結果は、線形変換が大規模な微調整に対して強靭であり、現実世界の関係と整合する場合に更新された知識を一般化するが、それが逸脱すると幻覚を引き起こすことを示しています。経験的結果は、線形相関がLMの一般化の潜在的な識別子として機能する可能性があることを示唆しています。最後に、このような線形相関は、単一のフィードフォワードネットワークと事前学習された語彙表現を用いて学習できることを示し、LMの一般化が後者に大きく依存していることを示しています。

English

The generalization of language models (LMs) is undergoing active debates, contrasting their potential for general intelligence with their struggles with basic knowledge composition (e.g., reverse/transition curse). This paper uncovers the phenomenon of linear correlations in LMs during knowledge composition. For explanation, there exists a linear transformation between certain related knowledge that maps the next token prediction logits from one prompt to another, e.g., "X lives in the city of" rightarrow "X lives in the country of" for every given X. This mirrors the linearity in human knowledge composition, such as Paris rightarrow France. Our findings indicate that the linear transformation is resilient to large-scale fine-tuning, generalizing updated knowledge when aligned with real-world relationships, but causing hallucinations when it deviates. Empirical results suggest that linear correlation can serve as a potential identifier of LM's generalization. Finally, we show such linear correlations can be learned with a single feedforward network and pre-trained vocabulary representations, indicating LM generalization heavily relies on the latter.

LMの合成的汎化と幻覚における線形相関

Linear Correlation in LM's Compositional Generalization and Hallucination

要旨

Support