HoT: 入力から支持する事実を参照するためのハイライトされた思考連鎖

要旨

大規模言語モデル（LLMs）の弱点の一つは、非事実的な記述を生成しがちな点である。事実と非事実が混在した応答は、人間が検証し、正確に意思決定を下す上で課題となる。この問題に対処するため、我々は「Highlighted Chain-of-Thought Prompting（HoT）」を提案する。これは、LLMsにXMLタグを用いてクエリに基づいた事実を明示した応答を生成させる手法である。具体的には、入力された質問に対して、LLMsはまずキーファクトを強調するXMLタグを追加して質問を再フォーマットし、その後、入力から参照された事実をハイライトした応答を生成する。興味深いことに、few-shot設定において、HoTは算術、読解、論理的推論など17の多様なタスクにおいて、従来のChain-of-Thought Prompting（CoT）を上回る性能を示す。人間がLLMの応答を検証する際、ハイライトは時間制約のある参加者がLLMが正しいかどうかをより正確かつ効率的に認識するのに役立つ。しかし、驚くべきことに、LLMが間違っている場合、HoTはユーザーにその答えが正しいと信じ込ませる傾向がある。

English

An Achilles heel of Large Language Models (LLMs) is their tendency to hallucinate non-factual statements. A response mixed of factual and non-factual statements poses a challenge for humans to verify and accurately base their decisions on. To combat this problem, we propose Highlighted Chain-of-Thought Prompting (HoT), a technique for prompting LLMs to generate responses with XML tags that ground facts to those provided in the query. That is, given an input question, LLMs would first re-format the question to add XML tags highlighting key facts, and then, generate a response with highlights over the facts referenced from the input. Interestingly, in few-shot settings, HoT outperforms vanilla chain of thought prompting (CoT) on a wide range of 17 tasks from arithmetic, reading comprehension to logical reasoning. When asking humans to verify LLM responses, highlights help time-limited participants to more accurately and efficiently recognize when LLMs are correct. Yet, surprisingly, when LLMs are wrong, HoTs tend to make users believe that an answer is correct.

HoT: 入力から支持する事実を参照するためのハイライトされた思考連鎖

HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs

要旨

Support