模型何時應該改變想法?大型語言模型中的上下文信念管理
When Should Models Change Their Minds? Contextual Belief Management in Large Language Models
May 28, 2026
作者: Haoming Xu, Weihong Xu, Zongrui Li, Mengru Wang, Yunzhi Yao, Chiyu Wu, Jin Shang, Yu Gong, Shumin Deng
cs.AI
摘要
長時間互動要求語言模型管理不斷累積的資訊:何時更新狀態、何時保留狀態,以及忽略哪些資訊。我們將此挑戰稱為「情境信念管理」(Contextual Belief Management, CBM):在隔離與任務無關的雜訊的同時,維持與形式證據一致的預測信念狀態。為使CBM可量化,我們提出BeliefTrack,一個涵蓋規則發現與電路診斷的封閉世界基準,其中有限的信念空間與符號驗證器能夠實現精確的逐輪評估。BeliefTrack診斷出三種失敗模式:保持失敗、更新失敗與隔離失敗。在多個大型語言模型中,基礎模型展現出嚴重的CBM缺陷,而明確的信念追蹤提示僅提供有限改善。相反地,採用信念狀態獎勵的強化學習平均將失敗率降低70.9%。進一步探測揭示這些失敗背後的潛在信念狀態動態,而表徵層面的引導在兩個任務中將失敗率降低46.1%\footnote{程式碼即將於 https://github.com/zjunlp/CBM 釋出。}。
English
Long-horizon interactions require language models to manage accumulating information: when to update their state, when to preserve their state, and what to ignore. We study this challenge as Contextual Belief Management (CBM): maintaining a predicted belief state aligned with formal evidence while isolating task-irrelevant noise. To make CBM measurable, we introduce BeliefTrack, a closed-world benchmark spanning Rule Discovery and Circuit Diagnosis, where a finite belief space and symbolic verifiers enable exact turn-level evaluation. BeliefTrack diagnoses three failures: Failed Stay, Failed Update, and Failed Isolation. Across multiple LLMs, vanilla models exhibit severe CBM failures, while explicit belief-tracking prompts provide limited gains. In contrast, reinforcement learning with belief-state rewards reduces failure rates by 70.9\% on average. Further probing reveals latent belief-state dynamics behind these failures, and representation-level steering reduces failure rates by 46.1\% across two tasks\footnote{Code is coming soon at https://github.com/zjunlp/CBM.