Revelando as Barreiras dos Agentes de Linguagem no Planejamento

Resumo

O planeamento autónomo tem sido uma busca contínua desde o início da inteligência artificial. Com base em solucionadores de problemas selecionados, agentes de planeamento iniciais conseguiam fornecer soluções precisas para tarefas específicas, mas careciam de generalização. A emergência de grandes modelos de linguagem (LLMs) e suas poderosas capacidades de raciocínio reacendeu o interesse no planeamento autónomo ao gerar automaticamente soluções razoáveis para tarefas específicas. No entanto, pesquisas anteriores e nossos experimentos mostram que os atuais agentes de linguagem ainda carecem de habilidades de planeamento ao nível humano. Mesmo o modelo de raciocínio de ponta, OpenAI o1, alcança apenas 15,6% em um dos complexos benchmarks de planeamento do mundo real. Isso destaca uma questão crítica: O que impede os agentes de linguagem de alcançar o planeamento ao nível humano? Embora estudos existentes tenham destacado o fraco desempenho no planeamento de agentes, as questões subjacentes mais profundas e os mecanismos e limitações das estratégias propostas para abordá-las permanecem insuficientemente compreendidos. Neste trabalho, aplicamos o estudo de atribuição de características e identificamos dois fatores-chave que impedem o planeamento de agentes: o papel limitado das restrições e a influência decrescente das perguntas. Também descobrimos que, embora as estratégias atuais ajudem a mitigar esses desafios, elas não os resolvem completamente, indicando que os agentes ainda têm um longo caminho a percorrer antes de alcançar a inteligência ao nível humano.

English

Autonomous planning has been an ongoing pursuit since the inception of artificial intelligence. Based on curated problem solvers, early planning agents could deliver precise solutions for specific tasks but lacked generalization. The emergence of large language models (LLMs) and their powerful reasoning capabilities has reignited interest in autonomous planning by automatically generating reasonable solutions for given tasks. However, prior research and our experiments show that current language agents still lack human-level planning abilities. Even the state-of-the-art reasoning model, OpenAI o1, achieves only 15.6% on one of the complex real-world planning benchmarks. This highlights a critical question: What hinders language agents from achieving human-level planning? Although existing studies have highlighted weak performance in agent planning, the deeper underlying issues and the mechanisms and limitations of the strategies proposed to address them remain insufficiently understood. In this work, we apply the feature attribution study and identify two key factors that hinder agent planning: the limited role of constraints and the diminishing influence of questions. We also find that although current strategies help mitigate these challenges, they do not fully resolve them, indicating that agents still have a long way to go before reaching human-level intelligence.

Revelando as Barreiras dos Agentes de Linguagem no Planejamento

Revealing the Barriers of Language Agents in Planning

Resumo

Support