大语言模型中的查询级别不确定性

摘要

大型语言模型需具备识别其知识边界的能力，即区分已知与未知查询的机制。这种认知能力有助于模型进行自适应推理，如调用检索增强生成（RAG）、深入慢思考或采用弃权机制，这对于开发高效且可信赖的人工智能至关重要。本研究提出了一种通过查询级别不确定性检测知识边界的方法，旨在无需生成任何词元即可判断模型能否解答给定查询。为此，我们引入了一种无需训练的新方法——内部置信度，该方法利用跨层和跨词元的自我评估。在事实问答和数学推理任务上的实证结果表明，我们的内部置信度方法优于多个基线模型。此外，我们展示了所提方法可用于高效的RAG和模型级联，能在保持性能的同时降低推理成本。

English

It is important for Large Language Models to be aware of the boundary of their knowledge, the mechanism of identifying known and unknown queries. This type of awareness can help models perform adaptive inference, such as invoking RAG, engaging in slow and deep thinking, or adopting the abstention mechanism, which is beneficial to the development of efficient and trustworthy AI. In this work, we propose a method to detect knowledge boundaries via Query-Level Uncertainty, which aims to determine if the model is able to address a given query without generating any tokens. To this end, we introduce a novel and training-free method called Internal Confidence, which leverages self-evaluations across layers and tokens. Empirical results on both factual QA and mathematical reasoning tasks demonstrate that our internal confidence can outperform several baselines. Furthermore, we showcase that our proposed method can be used for efficient RAG and model cascading, which is able to reduce inference costs while maintaining performance.

大语言模型中的查询级别不确定性

Query-Level Uncertainty in Large Language Models

摘要

Support