大規模言語モデルを使用した質問応答におけるARR：分析、検索、および推論を通じて

要旨

大規模言語モデル（LLMs）は、しばしば複数選択式の質問応答（QA）タスクとして構造化される難解なベンチマークで顕著なパフォーマンスを達成しています。ゼロショットのChain-of-Thought（CoT）プロンプティングは、LLMsの推論を向上させますが、「段階的に考える」という曖昧で一般的なガイダンスしか提供しません。本論文では、QA問題の解決において質問の意図を分析し、関連情報を取得し、段階的に推論するという3つの重要なステップを明示的に組み込む直感的で効果的なゼロショットプロンプティング手法であるARRを紹介します。多様で困難なQAタスク全体にわたる包括的な実験は、ARRが一貫してベースライン（ARRプロンプティングなし）を改善し、CoTを上回ることを示しています。部分削除実験と事例研究は、分析、取得、推論の各要素の肯定的な貢献をさらに検証しています。特に、意図分析はARRにおいて重要な役割を果たします。さらに、さまざまなモデルサイズ、LLMシリーズ、生成設定全体にわたる詳細な評価は、ARRの効果、堅牢性、一般性を確固たるものにしています。

English

Large language models (LLMs) achieve remarkable performance on challenging benchmarks that are often structured as multiple-choice question-answering (QA) tasks. Zero-shot Chain-of-Thought (CoT) prompting enhances reasoning in LLMs but provides only vague and generic guidance ("think step by step"). This paper introduces ARR, an intuitive and effective zero-shot prompting method that explicitly incorporates three key steps in QA solving: analyzing the intent of the question, retrieving relevant information, and reasoning step by step. Comprehensive experiments across diverse and challenging QA tasks demonstrate that ARR consistently improves the Baseline (without ARR prompting) and outperforms CoT. Ablation and case studies further validate the positive contributions of each component: analyzing, retrieving, and reasoning. Notably, intent analysis plays a vital role in ARR. Additionally, extensive evaluations across various model sizes, LLM series, and generation settings solidify the effectiveness, robustness, and generalizability of ARR.

大規模言語モデルを使用した質問応答におけるARR：分析、検索、および推論を通じて

ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning

要旨

Support