大语言模型中的查询级别不确定性
Query-Level Uncertainty in Large Language Models
June 11, 2025
作者: Lihu Chen, Gaël Varoquaux
cs.AI
摘要
大型语言模型需具备识别其知识边界的能力,即区分已知与未知查询的机制。这种认知能力有助于模型进行自适应推理,如调用检索增强生成(RAG)、深入慢思考或采用弃权机制,这对于开发高效且可信赖的人工智能至关重要。本研究提出了一种通过查询级别不确定性检测知识边界的方法,旨在无需生成任何词元即可判断模型能否解答给定查询。为此,我们引入了一种无需训练的新方法——内部置信度,该方法利用跨层和跨词元的自我评估。在事实问答和数学推理任务上的实证结果表明,我们的内部置信度方法优于多个基线模型。此外,我们展示了所提方法可用于高效的RAG和模型级联,能在保持性能的同时降低推理成本。
English
It is important for Large Language Models to be aware of the boundary of
their knowledge, the mechanism of identifying known and unknown queries. This
type of awareness can help models perform adaptive inference, such as invoking
RAG, engaging in slow and deep thinking, or adopting the abstention mechanism,
which is beneficial to the development of efficient and trustworthy AI. In this
work, we propose a method to detect knowledge boundaries via Query-Level
Uncertainty, which aims to determine if the model is able to address a given
query without generating any tokens. To this end, we introduce a novel and
training-free method called Internal Confidence, which leverages
self-evaluations across layers and tokens. Empirical results on both factual QA
and mathematical reasoning tasks demonstrate that our internal confidence can
outperform several baselines. Furthermore, we showcase that our proposed method
can be used for efficient RAG and model cascading, which is able to reduce
inference costs while maintaining performance.