Distillare il Feedback in Memoria come Strumento

Abstract

Proponiamo un framework che ammortizza il costo del ragionamento in fase di inferenza convertendo critiche transitorie in linee guida recuperabili, attraverso un sistema di memoria basato su file e chiamate a strumenti controllate da agenti. Valutiamo questo metodo sul Rubric Feedback Bench, un nuovo dataset per l'apprendimento basato su rubriche. Gli esperimenti dimostrano che i nostri LLM potenziati eguagliano rapidamente le prestazioni delle pipeline di raffinamento in fase di test, riducendo drasticamente il costo di inferenza.

English

We propose a framework that amortizes the cost of inference-time reasoning by converting transient critiques into retrievable guidelines, through a file-based memory system and agent-controlled tool calls. We evaluate this method on the Rubric Feedback Bench, a novel dataset for rubric-based learning. Experiments demonstrate that our augmented LLMs rapidly match the performance of test-time refinement pipelines while drastically reducing inference cost.

Distillare il Feedback in Memoria come Strumento

Distilling Feedback into Memory-as-a-Tool

Abstract

Support