推論時推論戦略の経験誘導型適応

要旨

エージェンシックAIシステムが、学習後の相互作用に基づいて問題解決アプローチを適応させる能力は、依然として根本的な課題である。推論時にメモリを更新・維持するシステムは提案されているが、既存の設計は言語モデルやエージェントへのテキスト入力を修正するだけで、サンプリングパラメータの変更、ツールの削除、システムプロンプトの修正、エージェンシックとワークフローパラダイムの切り替えができない。一方、より柔軟に適応するシステムはオフライン最適化を必要とし、一度デプロイされると静的になる。本研究では、蓄積された経験に基づいて推論時に動的に、LLM呼び出し、ツール、サンプリングパラメータ、制御ロジックを含む完全な計算手順である戦略を生成するExperience-Guided Reasoner（EGuR）を提案する。これは、戦略を出力するメタ戦略としてLLMを活用し、すべての戦略構成要素（プロンプト、サンプリングパラメータ、ツール設定、制御ロジック）の適応を可能にする。EGuRは二つのコンポーネントで動作する：Guideが現在の問題と構造化された過去の経験メモリに条件付けられて複数の候補戦略を生成し、Consolidatorが実行フィードバックを統合して将来の戦略生成を改善する。これにより、各問題に最適化された完全で即実行可能な戦略が生成され、リソースを浪費することなく必要に応じてキャッシュ、検索、実行できる。5つの困難なベンチマーク（AIME 2025、3-SAT、および3つのBig Bench Extra Hardタスク）において、EGuRは最强ベースライン比最大14%の精度向上を達成し、計算コストを最大111分の1に削減、さらに両指標はシステムの経験蓄積に伴って改善された。

English

Enabling agentic AI systems to adapt their problem-solving approaches based on post-training interactions remains a fundamental challenge. While systems that update and maintain a memory at inference time have been proposed, existing designs only steer the system by modifying textual input to a language model or agent, which means that they cannot change sampling parameters, remove tools, modify system prompts, or switch between agentic and workflow paradigms. On the other hand, systems that adapt more flexibly require offline optimization and remain static once deployed. We present Experience-Guided Reasoner (EGuR), which generates tailored strategies -- complete computational procedures involving LLM calls, tools, sampling parameters, and control logic -- dynamically at inference time based on accumulated experience. We achieve this using an LLM-based meta-strategy -- a strategy that outputs strategies -- enabling adaptation of all strategy components (prompts, sampling parameters, tool configurations, and control logic). EGuR operates through two components: a Guide generates multiple candidate strategies conditioned on the current problem and structured memory of past experiences, while a Consolidator integrates execution feedback to improve future strategy generation. This produces complete, ready-to-run strategies optimized for each problem, which can be cached, retrieved, and executed as needed without wasting resources. Across five challenging benchmarks (AIME 2025, 3-SAT, and three Big Bench Extra Hard tasks), EGuR achieves up to 14% accuracy improvements over the strongest baselines while reducing computational costs by up to 111x, with both metrics improving as the system gains experience.

推論時推論戦略の経験誘導型適応

Experience-Guided Adaptation of Inference-Time Reasoning Strategies

要旨

Support