ChatPaper.aiChatPaper

在 LM 的組合泛化和虛構中的線性相關

Linear Correlation in LM's Compositional Generalization and Hallucination

February 6, 2025
作者: Letian Peng, Chenyang An, Shibo Hao, Chengyu Dong, Jingbo Shang
cs.AI

摘要

語言模型(LMs)的泛化正在經歷積極的辯論,對其具有的泛智能潛力與其在基本知識組成(例如,反向/轉移詛咒)方面的挑戰形成對比。本文揭示了LMs在知識組成過程中的線性相關現象。舉例來說,存在著某些相關知識之間的線性轉換,可將從一個提示到另一個的下一個標記預測對數的映射,例如,對於每個給定的X,“X住在城市中”映射到“X住在國家中”。這反映了人類知識組成中的線性特徵,例如巴黎對應法國。我們的研究結果表明,這種線性轉換對於大規模微調是有韌性的,在與現實世界關係一致時泛化更新的知識,但一旦偏離就會引起幻覺。實證結果表明,線性相關可以作為LM泛化的潛在標識。最後,我們展示這種線性相關可以通過單個前饋網絡和預先訓練的詞彙表示學習,表明LM的泛化在很大程度上依賴於後者。
English
The generalization of language models (LMs) is undergoing active debates, contrasting their potential for general intelligence with their struggles with basic knowledge composition (e.g., reverse/transition curse). This paper uncovers the phenomenon of linear correlations in LMs during knowledge composition. For explanation, there exists a linear transformation between certain related knowledge that maps the next token prediction logits from one prompt to another, e.g., "X lives in the city of" rightarrow "X lives in the country of" for every given X. This mirrors the linearity in human knowledge composition, such as Paris rightarrow France. Our findings indicate that the linear transformation is resilient to large-scale fine-tuning, generalizing updated knowledge when aligned with real-world relationships, but causing hallucinations when it deviates. Empirical results suggest that linear correlation can serve as a potential identifier of LM's generalization. Finally, we show such linear correlations can be learned with a single feedforward network and pre-trained vocabulary representations, indicating LM generalization heavily relies on the latter.

Summary

AI-Generated Summary

PDF113February 10, 2025