REVERE: 科学的ワークフローのための反省的進化型研究エンジニア

要旨

既存のプロンプト最適化技術は、行動の更新に局所的な信号に依存することが多く、タスク間で広く繰り返し発生するパターンを見落とし、汎化性能の低下を招いています。さらに、プロンプト全体の書き換えや非構造化された結合に依存するため、知識の損失が生じます。これらの限界は、異種混在のリポジトリ、未詳細な環境、弱いフィードバックを伴い、公開コードベースからの結果再現が確立された評価手法である研究用コーディングワークフローにおいて、特に顕著になります。本研究では、Reflective Evolving Research Engineer (REVERE) を提案します。このフレームワークは、グローバルなトレーニングコンテキストから継続的に学習し、リポジトリ横断的な実行軌跡における繰り返し発生する失敗モードを認識し、それらを再利用可能なヒューリスティクスに蒸留し、システムプロンプト、タスクプロンプトテンプレート、累積的なチートシートという3つの設定可能なフィールドに対して的を絞った編集を行います。REVEREは、この反射的最適化フレームワークにより、研究コーディングタスクにおいて、従来の最先端の専門家作成の指示と比較して、SUPERでは4.50%、ResearchCodeBenchでは3.51%、ScienceAgentBenchでは4.89%（それぞれの評価指標に基づく）性能向上を達成しました。これらの結果は、継続的学習とグローバルな記憶統合のメカニズムを備えたエージェントが、時間の経過とともにその能力を意味的に進化させ得ることを実証しています。

English

Existing prompt-optimization techniques rely on local signals to update behavior, often neglecting broader and recurring patterns across tasks, leading to poor generalization; they further rely on full-prompt rewrites or unstructured merges, resulting in knowledge loss. These limitations are magnified in research-coding workflows, which involve heterogeneous repositories, underspecified environments, and weak feedback, where reproducing results from public codebases is an established evaluation regime. We introduce Reflective Evolving Research Engineer (REVERE), a framework that continuously learns from Global Training Context, recognizes recurring failure modes in cross-repository execution trajectories, distills them into reusable heuristics, and performs targeted edits across three configurable fields: the system prompt, a task-prompt template, and a cumulative cheatsheet. REVERE, via this reflective optimization framework, improves performance over prior state-of-the-art expert-crafted instructions on research coding tasks by 4.50% on SUPER, 3.51% on ResearchCodeBench, and 4.89% on ScienceAgentBench across their respective metrics. These results demonstrate that agents equipped with mechanisms for continual learning and global memory consolidation can meaningfully evolve their capabilities over time.

REVERE: 科学的ワークフローのための反省的進化型研究エンジニア

REVERE: Reflective Evolving Research Engineer for Scientific Workflows

要旨

Support