KV缓存引导技术：激发小型语言模型的推理能力

摘要

我们提出了一种名为缓存导向的轻量级方法，通过直接应用于键值缓存的一次性干预来实现对语言模型的隐式引导。为验证其有效性，我们将缓存导向应用于小型语言模型，以诱导其进行链式思维推理。该方法利用GPT-4o生成的推理轨迹构建导向向量，使模型行为转向更明确、多步骤的推理，而无需微调或修改提示。在多样化推理基准上的实验评估表明，缓存导向不仅提升了模型推理的定性结构，还显著提高了任务性能的定量指标。与需要持续干预的先前激活导向技术相比，我们的一次性缓存导向在超参数稳定性、推理效率及集成便捷性方面展现出显著优势，使其成为受控生成领域更为稳健且实用的解决方案。

English

We propose cache steering, a lightweight method for implicit steering of language models via a one-shot intervention applied directly to the key-value cache. To validate its effectiveness, we apply cache steering to induce chain-of-thought reasoning in small language models. Our approach leverages GPT-4o-generated reasoning traces to construct steering vectors that shift model behavior toward more explicit, multi-step reasoning without fine-tuning or prompt modifications. Experimental evaluations on diverse reasoning benchmarks demonstrate that cache steering improves both the qualitative structure of model reasoning and quantitative task performance. Compared to prior activation steering techniques that require continuous interventions, our one-shot cache steering offers substantial advantages in terms of hyperparameter stability, inference-time efficiency, and ease of integration, making it a more robust and practical solution for controlled generation.

KV缓存引导技术：激发小型语言模型的推理能力

KV Cache Steering for Inducing Reasoning in Small Language Models

摘要

Support