KV缓存引导以诱导小型语言模型进行推理

摘要

我們提出了一種名為緩存導向的輕量級方法，通過直接應用於鍵值緩存的一次性干預來隱式引導語言模型。為驗證其有效性，我們將緩存導向應用於誘導小型語言模型進行鏈式思維推理。我們的方法利用GPT-4o生成的推理軌跡來構建導向向量，從而將模型行為轉向更為明確的多步推理，而無需進行微調或提示修改。在多樣化的推理基準上的實驗評估表明，緩存導向不僅改善了模型推理的質性結構，還提升了任務的量化表現。與先前需要持續干預的激活導向技術相比，我們的一次性緩存導向在超參數穩定性、推理時間效率以及集成便捷性方面提供了顯著優勢，使其成為控制生成中更為穩健且實用的解決方案。

English

We propose cache steering, a lightweight method for implicit steering of language models via a one-shot intervention applied directly to the key-value cache. To validate its effectiveness, we apply cache steering to induce chain-of-thought reasoning in small language models. Our approach leverages GPT-4o-generated reasoning traces to construct steering vectors that shift model behavior toward more explicit, multi-step reasoning without fine-tuning or prompt modifications. Experimental evaluations on diverse reasoning benchmarks demonstrate that cache steering improves both the qualitative structure of model reasoning and quantitative task performance. Compared to prior activation steering techniques that require continuous interventions, our one-shot cache steering offers substantial advantages in terms of hyperparameter stability, inference-time efficiency, and ease of integration, making it a more robust and practical solution for controlled generation.

KV缓存引导以诱导小型语言模型进行推理

KV Cache Steering for Inducing Reasoning in Small Language Models

摘要

Support