PEARL：大型语言模型在长文档中规划和执行动作

摘要

诸如思维链提示等策略通过将输入示例分解为中间步骤来提高大型语言模型（LLMs）在复杂推理任务上的性能。然而，如何将这些方法应用于对长输入文档进行推理仍不清楚，因为在长文档中，无论是分解还是每个中间步骤的输出都不容易获得。在这项工作中，我们提出了PEARL，这是一个提示框架，旨在改善对长文档的推理，包括三个阶段：动作挖掘、计划制定和计划执行。更具体地，给定关于长文档的问题，PEARL将问题分解为一系列动作（例如，总结、查找事件、查找关系），然后在文档上执行这些动作以获得答案。PEARL的每个阶段都是通过零提示或少提示LLMs（在我们的工作中，是GPT-4）实现的，人类输入很少。我们在QuALITY数据集的一个具有挑战性的子集上评估了PEARL，该数据集包含需要对长叙述文本进行复杂推理的问题。PEARL在该数据集上优于零提示和思维链提示，并且消融实验表明PEARL的每个阶段对其性能至关重要。总体而言，PEARL是利用LLMs对长文档进行推理的第一步。

English

Strategies such as chain-of-thought prompting improve the performance of large language models (LLMs) on complex reasoning tasks by decomposing input examples into intermediate steps. However, it remains unclear how to apply such methods to reason over long input documents, in which both the decomposition and the output of each intermediate step are non-trivial to obtain. In this work, we propose PEARL, a prompting framework to improve reasoning over long documents, which consists of three stages: action mining, plan formulation, and plan execution. More specifically, given a question about a long document, PEARL decomposes the question into a sequence of actions (e.g., SUMMARIZE, FIND_EVENT, FIND_RELATION) and then executes them over the document to obtain the answer. Each stage of PEARL is implemented via zero-shot or few-shot prompting of LLMs (in our work, GPT-4) with minimal human input. We evaluate PEARL on a challenging subset of the QuALITY dataset, which contains questions that require complex reasoning over long narrative texts. PEARL outperforms zero-shot and chain-of-thought prompting on this dataset, and ablation experiments show that each stage of PEARL is critical to its performance. Overall, PEARL is a first step towards leveraging LLMs to reason over long documents.

PEARL：大型语言模型在长文档中规划和执行动作

PEARL: Prompting Large Language Models to Plan and Execute Actions Over Long Documents

摘要

Support