보이는 것은 다시 보이지 않게 할 수 없다: 대형 언어 모델에 대한 지식 충돌의 파괴적 영향

초록

대형 언어 모델은 작업을 수행하기 위해 문맥적 입력과 파라미터적 지식을 모두 자주 활용한다. 그러나 이러한 정보원은 특히 검색된 문서가 모델의 파라미터적 지식과 상충할 때 충돌을 일으킬 수 있다. 본 연구에서는 문맥 정보가 파라미터적 신념과 분기되는 상황에서의 LLM(대형 언어 모델) 행동을 체계적으로 평가하기 위한 진단 프레임워크를 제안한다. 이를 위해 이러한 충돌을 유발하는 진단 데이터를 구성하고, 다양한 작업 유형에 걸친 모델 성능을 분석하였다. 연구 결과는 다음과 같다: (1) 지식 활용이 필요하지 않은 작업에서는 지식 충돌의 영향이 미미하며, (2) 문맥적 지식과 파라미터적 지식이 일치할 때 모델 성능이 지속적으로 높고, (3) 모델은 지시를 받았을 때에도 내부 지식을 완전히 억제하지 못하며, (4) 충돌을 설명하는 근거를 제공할 경우 문맥에 대한 의존도가 증가한다. 이러한 통찰은 모델 기반 평가의 타당성에 대한 우려를 제기하며, LLM 배포 시 지식 충돌을 고려할 필요성을 강조한다.

English

Large language models frequently rely on both contextual input and parametric knowledge to perform tasks. However, these sources can come into conflict, especially when retrieved documents contradict the model's parametric knowledge. We propose a diagnostic framework to systematically evaluate LLM behavior under context-memory conflict, where the contextual information diverges from their parametric beliefs. We construct diagnostic data that elicit these conflicts and analyze model performance across multiple task types. Our findings reveal that (1) knowledge conflict has minimal impact on tasks that do not require knowledge utilization, (2) model performance is consistently higher when contextual and parametric knowledge are aligned, (3) models are unable to fully suppress their internal knowledge even when instructed, and (4) providing rationales that explain the conflict increases reliance on contexts. These insights raise concerns about the validity of model-based evaluation and underscore the need to account for knowledge conflict in the deployment of LLMs.

보이는 것은 다시 보이지 않게 할 수 없다: 대형 언어 모델에 대한 지식 충돌의 파괴적 영향

What Is Seen Cannot Be Unseen: The Disruptive Effect of Knowledge Conflict on Large Language Models

초록

Support