ChatPaper.aiChatPaper

基于经验引导的推理策略自适应调整

Experience-Guided Adaptation of Inference-Time Reasoning Strategies

November 14, 2025
作者: Adam Stein, Matthew Trager, Benjamin Bowman, Michael Kleinman, Aditya Chattopadhyay, Wei Xia, Stefano Soatto
cs.AI

摘要

如何让具备自主行为能力的人工智能系统在训练后通过交互自适应调整问题解决策略,仍是一个根本性挑战。现有支持推理时更新记忆的系统仅能通过修改语言模型或智能体的文本输入来引导系统,这意味着它们无法调整采样参数、移除工具、修改系统提示或在自主行为与工作流模式间切换。而具备更强适应性的系统则需要离线优化,部署后即保持静态。我们提出的经验引导推理器(EGuR)能在推理时基于累积经验动态生成定制化策略——包含大模型调用、工具使用、采样参数与控制逻辑的完整计算流程。这一突破通过基于大模型的元策略(即输出策略的策略)实现,支持对所有策略组件(提示词、采样参数、工具配置与控制逻辑)的适配。EGuR包含两个核心模块:引导器根据当前问题与结构化经验记忆生成多个候选策略,整合器则通过执行反馈优化后续策略生成。该系统能产出针对每个问题优化的完整可执行策略,支持缓存、检索与按需执行,避免资源浪费。在五项挑战性基准测试(AIME 2025、3-SAT及三项Big Bench Extra Hard任务)中,EGuR相较最强基线模型准确率最高提升14%,计算成本降低高达111倍,且两项指标均随系统经验积累持续优化。
English
Enabling agentic AI systems to adapt their problem-solving approaches based on post-training interactions remains a fundamental challenge. While systems that update and maintain a memory at inference time have been proposed, existing designs only steer the system by modifying textual input to a language model or agent, which means that they cannot change sampling parameters, remove tools, modify system prompts, or switch between agentic and workflow paradigms. On the other hand, systems that adapt more flexibly require offline optimization and remain static once deployed. We present Experience-Guided Reasoner (EGuR), which generates tailored strategies -- complete computational procedures involving LLM calls, tools, sampling parameters, and control logic -- dynamically at inference time based on accumulated experience. We achieve this using an LLM-based meta-strategy -- a strategy that outputs strategies -- enabling adaptation of all strategy components (prompts, sampling parameters, tool configurations, and control logic). EGuR operates through two components: a Guide generates multiple candidate strategies conditioned on the current problem and structured memory of past experiences, while a Consolidator integrates execution feedback to improve future strategy generation. This produces complete, ready-to-run strategies optimized for each problem, which can be cached, retrieved, and executed as needed without wasting resources. Across five challenging benchmarks (AIME 2025, 3-SAT, and three Big Bench Extra Hard tasks), EGuR achieves up to 14% accuracy improvements over the strongest baselines while reducing computational costs by up to 111x, with both metrics improving as the system gains experience.
PDF32December 1, 2025