ChatPaper.aiChatPaper

模型何时应改变其想法?大型语言模型中的上下文信念管理

When Should Models Change Their Minds? Contextual Belief Management in Large Language Models

May 28, 2026
作者: Haoming Xu, Weihong Xu, Zongrui Li, Mengru Wang, Yunzhi Yao, Chiyu Wu, Jin Shang, Yu Gong, Shumin Deng
cs.AI

摘要

长程交互要求语言模型管理不断积累的信息:何时更新其状态、何时保持其状态,以及应忽略哪些内容。我们将这一挑战视为上下文信念管理(CBM)问题:在隔离与任务无关的噪声的同时,维护与形式化证据对齐的预测信念状态。为了使CBM可量化,我们引入了BeliefTrack,这是一个涵盖规则发现与电路诊断的封闭世界基准测试,其有限信念空间与符号验证器能够实现精确的逐轮评估。BeliefTrack诊断出三种失败模式:保持失败、更新失败与隔离失败。在多种大语言模型上,基础模型表现出严重的CBM失败,而显式的信念追踪提示仅带来有限改进。相比之下,基于信念状态奖励的强化学习平均降低了70.9%的失败率。进一步探测揭示了这些失败背后的潜在信念状态动态,而表示层面的引导在两个任务上平均将失败率降低了46.1%\footnote{代码即将发布在 https://github.com/zjunlp/CBM。}。
English
Long-horizon interactions require language models to manage accumulating information: when to update their state, when to preserve their state, and what to ignore. We study this challenge as Contextual Belief Management (CBM): maintaining a predicted belief state aligned with formal evidence while isolating task-irrelevant noise. To make CBM measurable, we introduce BeliefTrack, a closed-world benchmark spanning Rule Discovery and Circuit Diagnosis, where a finite belief space and symbolic verifiers enable exact turn-level evaluation. BeliefTrack diagnoses three failures: Failed Stay, Failed Update, and Failed Isolation. Across multiple LLMs, vanilla models exhibit severe CBM failures, while explicit belief-tracking prompts provide limited gains. In contrast, reinforcement learning with belief-state rewards reduces failure rates by 70.9\% on average. Further probing reveals latent belief-state dynamics behind these failures, and representation-level steering reduces failure rates by 46.1\% across two tasks\footnote{Code is coming soon at https://github.com/zjunlp/CBM.