ChatPaper.aiChatPaper

ReflAct:基于目标状态反思的LLM智能体世界锚定决策机制

ReflAct: World-Grounded Decision Making in LLM Agents via Goal-State Reflection

May 21, 2025
作者: Jeonghye Kim, Sojeong Rhee, Minbeom Kim, Dohyung Kim, Sangmook Lee, Youngchul Sung, Kyomin Jung
cs.AI

摘要

近期,大型语言模型(LLM)代理的进展主要依托于如ReAct这样的推理框架,该框架在复杂环境中交替进行思考与行动。然而,ReAct常产生脱离实际或逻辑混乱的推理步骤,导致代理的实际状态与目标之间出现偏差。我们的分析发现,这源于ReAct无法维持一致的内部信念与目标对齐,从而引发错误累积与幻觉现象。为解决这一问题,我们提出了ReflAct,一种新型推理框架,它将推理从单纯规划下一步行动转向持续反思代理状态与其目标的相对关系。通过明确地将决策基于当前状态并强化持续的目标对齐,ReflAct显著提升了策略的可靠性。这一设计带来了显著的实证成果:ReflAct在ALFWorld任务中平均超越ReAct 27.7%,成功率高达93.3%。值得注意的是,即便ReAct配备了额外的增强模块(如Reflexion、WKM),ReflAct仍表现更优,这表明强化核心推理框架是确保代理性能可靠的关键。
English
Recent advances in LLM agents have largely built on reasoning backbones like ReAct, which interleave thought and action in complex environments. However, ReAct often produces ungrounded or incoherent reasoning steps, leading to misalignment between the agent's actual state and goal. Our analysis finds that this stems from ReAct's inability to maintain consistent internal beliefs and goal alignment, causing compounding errors and hallucinations. To address this, we introduce ReflAct, a novel backbone that shifts reasoning from merely planning next actions to continuously reflecting on the agent's state relative to its goal. By explicitly grounding decisions in states and enforcing ongoing goal alignment, ReflAct dramatically improves strategic reliability. This design delivers substantial empirical gains: ReflAct surpasses ReAct by 27.7% on average, achieving a 93.3% success rate in ALFWorld. Notably, ReflAct even outperforms ReAct with added enhancement modules (e.g., Reflexion, WKM), showing that strengthening the core reasoning backbone is key to reliable agent performance.

Summary

AI-Generated Summary

PDF52May 26, 2025