PEARL：大型語言模型引導計劃和執行長文檔中的動作

摘要

諸如思緒鏈提示等策略可通過將輸入示例分解為中間步驟來提高大型語言模型（LLMs）在複雜推理任務上的性能。然而，如何將這些方法應用於對長輸入文檔進行推理仍不清楚，因為在長文檔中，分解和每個中間步驟的輸出都不容易獲得。在這項工作中，我們提出了PEARL，一個提示框架，以改善對長文檔的推理，它包括三個階段：行動挖掘、計劃制定和計劃執行。具體而言，對於關於長文檔的問題，PEARL將問題分解為一系列行動（例如，摘要、查找事件、查找關係），然後在文檔上執行這些行動以獲得答案。PEARL的每個階段都是通過零-shot或少-shot提示LLMs（在我們的工作中是GPT-4）實現的，並且需要最少的人工輸入。我們在QuALITY數據集的一個具有挑戰性的子集上評估了PEARL，該數據集包含需要對長敘事文本進行複雜推理的問題。PEARL在這個數據集上優於零-shot和思緒鏈提示，消融實驗表明PEARL的每個階段對其性能至關重要。總的來說，PEARL是利用LLMs對長文檔進行推理的第一步。

English

Strategies such as chain-of-thought prompting improve the performance of large language models (LLMs) on complex reasoning tasks by decomposing input examples into intermediate steps. However, it remains unclear how to apply such methods to reason over long input documents, in which both the decomposition and the output of each intermediate step are non-trivial to obtain. In this work, we propose PEARL, a prompting framework to improve reasoning over long documents, which consists of three stages: action mining, plan formulation, and plan execution. More specifically, given a question about a long document, PEARL decomposes the question into a sequence of actions (e.g., SUMMARIZE, FIND_EVENT, FIND_RELATION) and then executes them over the document to obtain the answer. Each stage of PEARL is implemented via zero-shot or few-shot prompting of LLMs (in our work, GPT-4) with minimal human input. We evaluate PEARL on a challenging subset of the QuALITY dataset, which contains questions that require complex reasoning over long narrative texts. PEARL outperforms zero-shot and chain-of-thought prompting on this dataset, and ablation experiments show that each stage of PEARL is critical to its performance. Overall, PEARL is a first step towards leveraging LLMs to reason over long documents.

PEARL：大型語言模型引導計劃和執行長文檔中的動作

PEARL: Prompting Large Language Models to Plan and Execute Actions Over Long Documents

摘要

Support