大規模言語モデルにおけるクエリレベルの不確実性

要旨

大規模言語モデルにとって、自身の知識の境界を認識し、既知と未知のクエリを識別するメカニズムを理解することは重要である。この種の認識は、モデルが適応的な推論を行うのに役立ち、例えばRAG（Retrieval-Augmented Generation）の呼び出し、深くゆっくりとした思考、または棄却メカニズムの採用などが挙げられる。これらは、効率的で信頼性の高いAIの開発に有益である。本研究では、クエリレベルの不確実性を介して知識の境界を検出する方法を提案する。この方法は、モデルがトークンを生成せずに与えられたクエリに対処できるかどうかを判断することを目的としている。そのために、層とトークンにわたる自己評価を活用する、新規でトレーニング不要な方法である「内部信頼度」を導入する。事実に基づくQA（Question Answering）および数学的推論タスクにおける実証結果は、我々の内部信頼度がいくつかのベースラインを上回ることを示している。さらに、提案手法が効率的なRAGおよびモデルカスケーディングに利用可能であり、性能を維持しながら推論コストを削減できることを示す。

English

It is important for Large Language Models to be aware of the boundary of their knowledge, the mechanism of identifying known and unknown queries. This type of awareness can help models perform adaptive inference, such as invoking RAG, engaging in slow and deep thinking, or adopting the abstention mechanism, which is beneficial to the development of efficient and trustworthy AI. In this work, we propose a method to detect knowledge boundaries via Query-Level Uncertainty, which aims to determine if the model is able to address a given query without generating any tokens. To this end, we introduce a novel and training-free method called Internal Confidence, which leverages self-evaluations across layers and tokens. Empirical results on both factual QA and mathematical reasoning tasks demonstrate that our internal confidence can outperform several baselines. Furthermore, we showcase that our proposed method can be used for efficient RAG and model cascading, which is able to reduce inference costs while maintaining performance.

大規模言語モデルにおけるクエリレベルの不確実性

Query-Level Uncertainty in Large Language Models

要旨

Support