HoT:用於從輸入中引用支持事實的突出思維鏈
HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs
March 3, 2025
作者: Tin Nguyen, Logan Bolton, Mohammad Reza Taesiri, Anh Totti Nguyen
cs.AI
摘要
大型語言模型(LLMs)的一個致命弱點在於其傾向於產生虛構的非事實陳述。這種混合了事實與非事實的回應,對人類而言,在驗證和基於這些信息做出準確決策時構成了挑戰。為解決這一問題,我們提出了「高亮思維鏈提示法」(Highlighted Chain-of-Thought Prompting, HoT),這是一種引導LLMs生成帶有XML標籤回應的技術,這些標籤將事實與查詢中提供的信息相錨定。具體而言,給定一個輸入問題,LLMs首先會重新格式化問題,加入XML標籤以突出關鍵事實,隨後生成回應,並在引用自輸入的事實上進行高亮顯示。有趣的是,在少樣本設置下,HoT在從算術、閱讀理解到邏輯推理的17項廣泛任務上,均優於基礎的思維鏈提示法(CoT)。當要求人類驗證LLM的回應時,高亮顯示幫助時間有限的參與者更準確且高效地識別出LLM何時正確。然而,令人驚訝的是,當LLM出錯時,HoT往往會讓用戶誤以為答案是正確的。
English
An Achilles heel of Large Language Models (LLMs) is their tendency to
hallucinate non-factual statements. A response mixed of factual and non-factual
statements poses a challenge for humans to verify and accurately base their
decisions on. To combat this problem, we propose Highlighted Chain-of-Thought
Prompting (HoT), a technique for prompting LLMs to generate responses with XML
tags that ground facts to those provided in the query. That is, given an input
question, LLMs would first re-format the question to add XML tags highlighting
key facts, and then, generate a response with highlights over the facts
referenced from the input. Interestingly, in few-shot settings, HoT outperforms
vanilla chain of thought prompting (CoT) on a wide range of 17 tasks from
arithmetic, reading comprehension to logical reasoning. When asking humans to
verify LLM responses, highlights help time-limited participants to more
accurately and efficiently recognize when LLMs are correct. Yet, surprisingly,
when LLMs are wrong, HoTs tend to make users believe that an answer is correct.