将反馈精炼为记忆即工具

摘要

我们提出了一种通过基于文件的记忆系统和智能体控制的工具调用，将瞬时反馈转化为可检索指导原则的框架，以此分摊推理过程中的计算成本。该方法在Rubric Feedback Bench（一种基于量规学习的新型数据集）上进行了评估。实验表明，增强后的大型语言模型能快速达到测试时优化流程的性能水平，同时显著降低推理成本。

English

We propose a framework that amortizes the cost of inference-time reasoning by converting transient critiques into retrievable guidelines, through a file-based memory system and agent-controlled tool calls. We evaluate this method on the Rubric Feedback Bench, a novel dataset for rubric-based learning. Experiments demonstrate that our augmented LLMs rapidly match the performance of test-time refinement pipelines while drastically reducing inference cost.