透過分析、檢索和推理實現大型語言模型的問答技術
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning
February 7, 2025
作者: Yuwei Yin, Giuseppe Carenini
cs.AI
摘要
大型語言模型(LLMs)在常以多重選擇問答(QA)任務結構的具挑戰性基準上取得卓越表現。零-shot Chain-of-Thought(CoT)提示增強了LLMs的推理能力,但僅提供模糊和通用的指導("逐步思考")。本文介紹了ARR,一種直觀且有效的零-shot提示方法,明確地結合了QA解決中的三個關鍵步驟:分析問題意圖、檢索相關信息,以及逐步推理。在各種具挑戰性的QA任務上進行的全面實驗表明,ARR持續改善了基準(不含ARR提示)並優於CoT。消融和案例研究進一步驗證了每個組成部分的積極貢獻:分析、檢索和推理。值得注意的是,意圖分析在ARR中發揮了至關重要的作用。此外,對各種模型大小、LLM系列和生成設置的廣泛評估鞏固了ARR的有效性、韌性和泛化能力。
English
Large language models (LLMs) achieve remarkable performance on challenging
benchmarks that are often structured as multiple-choice question-answering (QA)
tasks. Zero-shot Chain-of-Thought (CoT) prompting enhances reasoning in LLMs
but provides only vague and generic guidance ("think step by step"). This paper
introduces ARR, an intuitive and effective zero-shot prompting method that
explicitly incorporates three key steps in QA solving: analyzing the intent of
the question, retrieving relevant information, and reasoning step by step.
Comprehensive experiments across diverse and challenging QA tasks demonstrate
that ARR consistently improves the Baseline (without ARR prompting) and
outperforms CoT. Ablation and case studies further validate the positive
contributions of each component: analyzing, retrieving, and reasoning. Notably,
intent analysis plays a vital role in ARR. Additionally, extensive evaluations
across various model sizes, LLM series, and generation settings solidify the
effectiveness, robustness, and generalizability of ARR.Summary
AI-Generated Summary