새로운 데이터가 LLM(대형 언어 모델) 지식에 어떻게 스며들고 이를 희석하는 방법

초록

대규모 언어 모델은 그래디언트 기반 업데이트의 축적을 통해 학습하고 지속적으로 학습하지만, 새로운 정보의 개별 조각이 기존 지식에 어떻게 영향을 미쳐 유익한 일반화와 문제가 되는 환각(hallucination)을 동시에 초래하는지에 대해서는 여전히 잘 이해되지 않고 있습니다. 우리는 새로운 정보를 학습할 때 LLM이 "프라이밍(priming)" 효과를 보인다는 것을 입증했습니다: 새로운 사실을 학습하면 모델이 관련 없는 맥락에서 그 지식을 부적절하게 적용할 수 있습니다. 이러한 현상을 체계적으로 연구하기 위해, 우리는 새로운 지식이 LLM의 기존 지식 기반에 어떻게 스며드는지를 탐구하기 위해 설계된 1,320개의 다양한 텍스트 샘플로 구성된 "Outlandish" 데이터셋을 소개합니다. 이 데이터셋을 사용하여, 새로운 정보를 학습한 후의 프라이밍 정도는 학습 전 핵심 단어의 토큰 확률을 측정함으로써 예측할 수 있음을 보여줍니다. 이러한 관계는 다양한 모델 아키텍처(PALM-2, Gemma, Llama), 크기 및 학습 단계에서도 강건하게 유지됩니다. 마지막으로, 우리는 새로운 지식이 기존 모델 행동에 미치는 영향을 조절하기 위한 두 가지 새로운 기법을 개발했습니다: (1) "스텝핑스톤(stepping-stone)" 텍스트 증강 전략과 (2) "ignore-k" 업데이트 가지치기 방법입니다. 이러한 접근법은 모델의 새로운 정보 학습 능력을 유지하면서 바람직하지 않은 프라이밍 효과를 50-95% 감소시킵니다. 우리의 연구 결과는 LLM이 어떻게 학습하는지에 대한 실증적 통찰을 제공할 뿐만 아니라 언어 모델에서 지식 삽입의 특이성을 개선하기 위한 실용적인 도구를 제공합니다. 추가 자료: https://sunchipsster1.github.io/projects/outlandish/

English

Large language models learn and continually learn through the accumulation of gradient-based updates, but how individual pieces of new information affect existing knowledge, leading to both beneficial generalization and problematic hallucination, remains poorly understood. We demonstrate that when learning new information, LLMs exhibit a "priming" effect: learning a new fact can cause the model to inappropriately apply that knowledge in unrelated contexts. To systematically study this phenomenon, we introduce "Outlandish," a carefully curated dataset of 1320 diverse text samples designed to probe how new knowledge permeates through an LLM's existing knowledge base. Using this dataset, we show that the degree of priming after learning new information can be predicted by measuring the token probability of key words before learning. This relationship holds robustly across different model architectures (PALM-2, Gemma, Llama), sizes, and training stages. Finally, we develop two novel techniques to modulate how new knowledge affects existing model behavior: (1) a ``stepping-stone'' text augmentation strategy and (2) an ``ignore-k'' update pruning method. These approaches reduce undesirable priming effects by 50-95\% while preserving the model's ability to learn new information. Our findings provide both empirical insights into how LLMs learn and practical tools for improving the specificity of knowledge insertion in language models. Further materials: https://sunchipsster1.github.io/projects/outlandish/

새로운 데이터가 LLM(대형 언어 모델) 지식에 어떻게 스며들고 이를 희석하는 방법

How new data permeates LLM knowledge and how to dilute it

초록

Support