HoT：用於從輸入中引用支持事實的突出思維鏈

摘要

大型語言模型（LLMs）的一個致命弱點在於其傾向於產生虛構的非事實陳述。這種混合了事實與非事實的回應，對人類而言，在驗證和基於這些信息做出準確決策時構成了挑戰。為解決這一問題，我們提出了「高亮思維鏈提示法」（Highlighted Chain-of-Thought Prompting, HoT），這是一種引導LLMs生成帶有XML標籤回應的技術，這些標籤將事實與查詢中提供的信息相錨定。具體而言，給定一個輸入問題，LLMs首先會重新格式化問題，加入XML標籤以突出關鍵事實，隨後生成回應，並在引用自輸入的事實上進行高亮顯示。有趣的是，在少樣本設置下，HoT在從算術、閱讀理解到邏輯推理的17項廣泛任務上，均優於基礎的思維鏈提示法（CoT）。當要求人類驗證LLM的回應時，高亮顯示幫助時間有限的參與者更準確且高效地識別出LLM何時正確。然而，令人驚訝的是，當LLM出錯時，HoT往往會讓用戶誤以為答案是正確的。

English

An Achilles heel of Large Language Models (LLMs) is their tendency to hallucinate non-factual statements. A response mixed of factual and non-factual statements poses a challenge for humans to verify and accurately base their decisions on. To combat this problem, we propose Highlighted Chain-of-Thought Prompting (HoT), a technique for prompting LLMs to generate responses with XML tags that ground facts to those provided in the query. That is, given an input question, LLMs would first re-format the question to add XML tags highlighting key facts, and then, generate a response with highlights over the facts referenced from the input. Interestingly, in few-shot settings, HoT outperforms vanilla chain of thought prompting (CoT) on a wide range of 17 tasks from arithmetic, reading comprehension to logical reasoning. When asking humans to verify LLM responses, highlights help time-limited participants to more accurately and efficiently recognize when LLMs are correct. Yet, surprisingly, when LLMs are wrong, HoTs tend to make users believe that an answer is correct.

HoT：用於從輸入中引用支持事實的突出思維鏈

HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs

摘要

Support