ChatPaper.aiChatPaper

揭示语言代理在规划中的障碍

Revealing the Barriers of Language Agents in Planning

October 16, 2024
作者: Jian Xie, Kexun Zhang, Jiangjie Chen, Siyu Yuan, Kai Zhang, Yikai Zhang, Lei Li, Yanghua Xiao
cs.AI

摘要

自主规划是自从人工智能诞生以来一直在追求的目标。早期的规划代理基于精心策划的问题解决者,能够为特定任务提供精确的解决方案,但缺乏泛化能力。大型语言模型(LLMs)的出现及其强大的推理能力重新激发了对自主规划的兴趣,因为它们能够自动生成针对给定任务的合理解决方案。然而,先前的研究和我们的实验证明,当前的语言代理仍然缺乏人类级别的规划能力。即使是最先进的推理模型OpenAI o1,在复杂的现实世界规划基准测试中也仅达到15.6%。这凸显了一个关键问题:是什么阻碍了语言代理实现人类级别的规划能力?尽管现有研究已经强调了代理规划的性能不佳,但对于更深层次的根本问题、以及为解决这些问题提出的策略的机制和局限性仍然了解不足。在这项研究中,我们应用特征归因研究,确定了阻碍代理规划的两个关键因素:约束的作用受限以及问题影响力的减弱。我们还发现,尽管当前的策略有助于缓解这些挑战,但并未完全解决,这表明代理在达到人类级别智能之前还有很长的路要走。
English
Autonomous planning has been an ongoing pursuit since the inception of artificial intelligence. Based on curated problem solvers, early planning agents could deliver precise solutions for specific tasks but lacked generalization. The emergence of large language models (LLMs) and their powerful reasoning capabilities has reignited interest in autonomous planning by automatically generating reasonable solutions for given tasks. However, prior research and our experiments show that current language agents still lack human-level planning abilities. Even the state-of-the-art reasoning model, OpenAI o1, achieves only 15.6% on one of the complex real-world planning benchmarks. This highlights a critical question: What hinders language agents from achieving human-level planning? Although existing studies have highlighted weak performance in agent planning, the deeper underlying issues and the mechanisms and limitations of the strategies proposed to address them remain insufficiently understood. In this work, we apply the feature attribution study and identify two key factors that hinder agent planning: the limited role of constraints and the diminishing influence of questions. We also find that although current strategies help mitigate these challenges, they do not fully resolve them, indicating that agents still have a long way to go before reaching human-level intelligence.

Summary

AI-Generated Summary

PDF282November 16, 2024