프롬프트 수준 증류: 효율적 추론을 위한 모델 미세 조정의 비모수적 대안

초록

고급 추론은 일반적으로 Chain-of-Thought 프롬프팅을 필요로 하며, 이는 정확하지만 과도한 지연 시간과 상당한 테스트 시간 추론 비용을 초래한다. 대안인 소형 모델의 미세 조정은 종종 해석 가능성을 희생하면서 상당한 리소스와 운영 오버헤드를 도입한다. 이러한 한계를 해결하기 위해, 우리는 프롬프트 수준 증류(PLD)를 도입한다. 우리는 교사 모델로부터 명시적 추론 패턴을 추출하여 학생 모델의 시스템 프롬프트에 대한 표현적 지침의 구조화된 목록으로 구성한다. Gemma-3 4B를 사용하여 평가한 결과, PLD는 StereoSet에서 매크로 F1 점수를 57%에서 90.0%로, Contract-NLI에서 67%에서 83%로 향상시켰으며, LogiQA 정확도를 70%로 증가시켰다. Mistral Small 3.1에서 유사한 결과는 교차 아키텍처 일반화 가능성을 입증하며, 이러한 소형 모델이 무시할 수 있는 지연 시간 오버헤드로 최첨단 성능에 도달할 수 있게 한다. 이러한 표현적 지침은 의사 결정 과정을 투명하게 만들어 논리에 대한 완전한 인간 검증을 가능하게 하므로, 이 접근 방식은 법률, 금융, 콘텐츠 조정과 같은 규제 산업뿐만 아니라 대량 사용 사례 및 엣지 디바이스에 이상적이다.

English

Advanced reasoning typically requires Chain-of-Thought prompting, which is accurate but incurs prohibitive latency and substantial test-time inference costs. The standard alternative, fine-tuning smaller models, often sacrifices interpretability while introducing significant resource and operational overhead. To address these limitations, we introduce Prompt-Level Distillation (PLD). We extract explicit reasoning patterns from a Teacher model and organize them into a structured list of expressive instructions for the Student model's System Prompt. Evaluated using Gemma-3 4B, PLD improved Macro F1 scores on StereoSet (57\% to 90.0\%) and Contract-NLI (67\% to 83\%), while increasing LogiQA accuracy to 70\%. Similar results on Mistral Small 3.1 demonstrate cross-architecture generalizability, enabling these compact models to match frontier performance with negligible latency overhead. These expressive instructions render the decision-making process transparent, allowing for full human verification of logic, making this approach ideal for regulated industries such as law, finance, and content moderation, as well as high-volume use cases and edge devices.