超多步骤：困难长文本任务背后的真相

摘要

长上下文语言模型（LCLM）以其广泛的上下文窗口而闻名，正变得日益流行。与此同时，许多长上下文基准提出了具有挑战性的任务，即使是最先进的LCLM也难以完成。然而，各种具有挑战性的长上下文任务的根源却鲜为人知。为了填补这一空白，我们进行实验，表明这些困难主要源于两个基本问题：“多匹配检索”，需要同时检索多个项目，以及“基于逻辑的检索”，需要在检索标准内进行逻辑判断。这两个问题，虽然看似简单，实际上超出了LCLM的能力，因为它们被证明具有超级多步骤（需要大量步骤来解决）的性质。这一发现可以解释为什么LLM在更高级的长上下文任务中遇到困难，为重新思考解决方案提供了更准确的视角。

English

Long-context language models (LCLM), characterized by their extensive context window, is becoming increasingly popular. Meanwhile, many long-context benchmarks present challenging tasks that even the most advanced LCLMs struggle to complete. However, the underlying sources of various challenging long-context tasks have seldom been studied. To bridge this gap, we conduct experiments to indicate their difficulty stems primarily from two basic issues: "multi-matching retrieval," which requires the simultaneous retrieval of multiple items, and "logic-based retrieval," which necessitates logical judgment within retrieval criteria. These two problems, while seemingly straightforward, actually exceed the capabilities of LCLMs because they are proven to be hyper-multi-step (demanding numerous steps to solve) in nature. This finding could explain why LLMs struggle with more advanced long-context tasks, providing a more accurate perspective for rethinking solutions for them.

超多步骤：困难长文本任务背后的真相

Hyper-multi-step: The Truth Behind Difficult Long-context Tasks

摘要

Support