将反馈精炼为记忆即工具

摘要

我們提出了一個框架，通過基於文件的存儲系統和智能體控制的工具調用，將瞬態評析轉化爲可檢索的指導原則，從而攤平推理階段思維過程的計算成本。我們在Rubric Feedback Bench（一個基於評分量表學習的新型數據集）上對該方法進行評估。實驗表明，經過增強的語言模型能快速達到測試時優化流程的性能水平，同時顯著降低推理成本。

English

We propose a framework that amortizes the cost of inference-time reasoning by converting transient critiques into retrievable guidelines, through a file-based memory system and agent-controlled tool calls. We evaluate this method on the Rubric Feedback Bench, a novel dataset for rubric-based learning. Experiments demonstrate that our augmented LLMs rapidly match the performance of test-time refinement pipelines while drastically reducing inference cost.

将反馈精炼为记忆即工具

Distilling Feedback into Memory-as-a-Tool

摘要

Support