未雨綢繆，事半功倍：語言模型的主動自我優化

摘要

自我精進技術的最新進展顯示出通過迭代精煉來提升大型語言模型（LLMs）輸出質量的巨大潛力。然而，現有的自我精進方法大多依賴於固定迭代次數的被動過程，這使得基於生成上下文動態變化來確定最佳精煉時機和內容變得困難。受人類在執行過程中動態精煉思維方式的啟發，我們提出了主動自我精進（ProActive Self-Refinement, PASR），這是一種新穎的方法，使LLMs能夠在生成過程中精煉其輸出。與重新生成整個回應的方法不同，PASR根據模型的內部狀態和演進上下文主動決定是否、何時以及如何進行精煉。我們在10個多樣化的任務上進行了廣泛的實驗，以評估PASR的有效性。實驗結果表明，PASR顯著提升了問題解決性能。特別是在Qwen3-8B模型上，PASR相比標準生成平均減少了41.6%的token消耗，同時準確率提高了8.2%。我們的代碼及論文中使用的所有基準測試均可在GitHub上獲取。

English

Recent advances in self-refinement have demonstrated significant potential for improving the outputs of large language models (LLMs) through iterative refinement. However, most existing self-refinement methods rely on a reactive process with a fixed number of iterations, making it difficult to determine the optimal timing and content of refinement based on the evolving generation context. Inspired by the way humans dynamically refine their thoughts during execution, we propose ProActive Self-Refinement (PASR), a novel method that enables LLMs to refine their outputs during the generation process. Unlike methods that regenerate entire responses, PASR proactively decides whether, when, and how to refine based on the model's internal state and evolving context. We conduct extensive experiments on a diverse set of 10 tasks to evaluate the effectiveness of PASR. Experimental results show that PASR significantly enhances problem-solving performance. In particular, on Qwen3-8B, PASR reduces average token consumption by 41.6 percent compared to standard generation, while also achieving an 8.2 percent improvement in accuracy. Our code and all baselines used in the paper are available in the GitHub.

未雨綢繆，事半功倍：語言模型的主動自我優化

A Stitch in Time Saves Nine: Proactive Self-Refinement for Language Models

摘要

Support