ChatPaper.aiChatPaper

实现零误差解决百万步长LLM任务

Solving a Million-Step LLM Task with Zero Errors

November 12, 2025
作者: Elliot Meyerson, Giuseppe Paolo, Roberto Dailey, Hormoz Shahrzad, Olivier Francon, Conor F. Hayes, Xin Qiu, Babak Hodjat, Risto Miikkulainen
cs.AI

摘要

大型语言模型在推理能力、洞察深度和工具调用方面取得了显著突破,但将这些能力串联成人类、组织和社会日常执行的规模化扩展流程仍难以实现。模型存在的持续错误率阻碍了规模扩展:例如近期在汉诺塔基准领域的实验表明,推理过程在最多数百步后必然失控。因此,尽管当前LLM研究仍主要针对依赖逻辑步骤较少的任务进行基准测试,但学界正日益关注其执行长程任务的能力缺陷。本文提出的MAKER系统首次实现了零错误完成超百万步LLM推理的任务,且理论上具备远超该规模的扩展能力。该方法通过将任务极端分解为可由专注微代理处理的子任务,其产生的高度模块化结构使得每一步都能通过高效的多智能体投票机制进行纠错。这种极端分解与纠错机制的结合使规模化扩展成为可能。研究结果表明,相较于持续改进现有LLM,采用大规模分解式代理流程(MDAPs)或许能更高效地解决组织与社会层级的复杂问题。
English
LLMs have achieved remarkable breakthroughs in reasoning, insights, and tool use, but chaining these abilities into extended processes at the scale of those routinely executed by humans, organizations, and societies has remained out of reach. The models have a persistent error rate that prevents scale-up: for instance, recent experiments in the Towers of Hanoi benchmark domain showed that the process inevitably becomes derailed after at most a few hundred steps. Thus, although LLM research is often still benchmarked on tasks with relatively few dependent logical steps, there is increasing attention on the ability (or inability) of LLMs to perform long range tasks. This paper describes MAKER, the first system that successfully solves a task with over one million LLM steps with zero errors, and, in principle, scales far beyond this level. The approach relies on an extreme decomposition of a task into subtasks, each of which can be tackled by focused microagents. The high level of modularity resulting from the decomposition allows error correction to be applied at each step through an efficient multi-agent voting scheme. This combination of extreme decomposition and error correction makes scaling possible. Thus, the results suggest that instead of relying on continual improvement of current LLMs, massively decomposed agentic processes (MDAPs) may provide a way to efficiently solve problems at the level of organizations and societies.
PDF193December 1, 2025