ChatPaper.aiChatPaper

提示層級蒸餾:非參數化模型微調替代方案以實現高效推理

Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning

June 2, 2026
作者: Sanket Badhe, Deep Shah
cs.AI

摘要

高級推理通常需要鏈式思考提示,雖然這種方法準確,但會導致難以承受的延遲與大量的測試階段推論成本。標準的替代方案是微調較小模型,但往往犧牲可解釋性,同時引入顯著的資源與營運負擔。為了解決這些限制,我們引入了提示級蒸餾(Prompt-Level Distillation, PLD)。我們從教師模型中提取明確的推理模式,並將其組織成結構化的指令列表,作為學生模型系統提示的一部分。使用 Gemma-3 4B 進行評估,PLD 將 StereoSet 的巨觀 F1 分數從 57% 提升至 90.0%,將 Contract-NLI 從 67% 提升至 83%,同時將 LogiQA 的準確率提高至 70%。在 Mistral Small 3.1 上得到的類似結果證明了其跨架構的通用性,使這些精簡模型能夠在不增加顯著延遲負擔的情況下,達到前沿性能。這些明確的指令使決策過程透明化,允許對邏輯進行完整的人為驗證,使本方法非常適合法律、金融及內容審核等受監管行業,以及高流量應用場景與邊緣設備。
English
Advanced reasoning typically requires Chain-of-Thought prompting, which is accurate but incurs prohibitive latency and substantial test-time inference costs. The standard alternative, fine-tuning smaller models, often sacrifices interpretability while introducing significant resource and operational overhead. To address these limitations, we introduce Prompt-Level Distillation (PLD). We extract explicit reasoning patterns from a Teacher model and organize them into a structured list of expressive instructions for the Student model's System Prompt. Evaluated using Gemma-3 4B, PLD improved Macro F1 scores on StereoSet (57\% to 90.0\%) and Contract-NLI (67\% to 83\%), while increasing LogiQA accuracy to 70\%. Similar results on Mistral Small 3.1 demonstrate cross-architecture generalizability, enabling these compact models to match frontier performance with negligible latency overhead. These expressive instructions render the decision-making process transparent, allowing for full human verification of logic, making this approach ideal for regulated industries such as law, finance, and content moderation, as well as high-volume use cases and edge devices.