ChatPaper.aiChatPaper

自信的幻象?通过邻域一致性诊断大语言模型真实性

Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency

January 9, 2026
作者: Haoming Xu, Ningyuan Zhao, Yunzhi Yao, Weihong Xu, Hongru Wang, Xinle Deng, Shumin Deng, Jeff Z. Pan, Huajun Chen, Ningyu Zhang
cs.AI

摘要

随着大语言模型(LLMs)在现实场景中的日益普及,仅关注答案正确性已显不足。要实现可靠部署,必须确保模型在上下文扰动下仍能保持真实信念。现有评估方法主要依赖自洽性等点状置信度指标,这可能掩盖信念的脆弱性。我们发现,即使具有完美自洽性的事实答案,在轻微上下文干扰下也会迅速崩溃。为弥补这一缺陷,我们提出邻域一致性信念(NCB)——一种通过评估概念邻域内响应连贯性来衡量信念稳健性的结构化指标。为验证NCB的有效性,我们设计了一套新型认知压力测试方案,用于探测上下文干扰下的输出稳定性。多组LLM实验表明,高NCB数据在干扰下的性能衰减相对更小。最后,我们提出结构感知训练(SAT)方法,通过优化上下文不变的信念结构,将长尾知识的脆弱性降低约30%。代码已发布于https://github.com/zjunlp/belief。
English
As Large Language Models (LLMs) are increasingly deployed in real-world settings, correctness alone is insufficient. Reliable deployment requires maintaining truthful beliefs under contextual perturbations. Existing evaluations largely rely on point-wise confidence like Self-Consistency, which can mask brittle belief. We show that even facts answered with perfect self-consistency can rapidly collapse under mild contextual interference. To address this gap, we propose Neighbor-Consistency Belief (NCB), a structural measure of belief robustness that evaluates response coherence across a conceptual neighborhood. To validate the efficiency of NCB, we introduce a new cognitive stress-testing protocol that probes outputs stability under contextual interference. Experiments across multiple LLMs show that the performance of high-NCB data is relatively more resistant to interference. Finally, we present Structure-Aware Training (SAT), which optimizes context-invariant belief structure and reduces long-tail knowledge brittleness by approximately 30%. Code will be available at https://github.com/zjunlp/belief.
PDF121January 13, 2026