LK 손실: 스페큘레이티브 디코딩을 위한 직접 수용률 최적화

초록

추론적 디코딩은 경량 드래프트 모델이 후보 토큰을 제안하고 이를 대상 모델이 병렬로 검증하는 방식으로 자회귀적 대형 언어 모델(LLM) 추론을 가속화합니다. 속도 향상은 수용률에 크게 좌우되지만, 기존 학습 방식은 대리 목적함수로 쿨백-라이블러 발산을 최소화합니다. KL 발산과 수용률이 전역 최적점은 동일하지만, 제한된 용량을 가진 소형 드래프트 모델은 일반적으로 KL 최소화가 수용률 최대화를 보장하지 않는 차선책으로 수렴합니다. 이 문제를 해결하기 위해 우리는 수용률을 직접 목표로 하는 특수 학습 목적함수인 LK 손실을 제안합니다. 8B부터 685B 매개변수까지 다양한 6가지 대상 모델과 4가지 드래프트 아키텍처에서 진행한 포괄적 실험을 통해, 기존 KL 기반 학습 대비 모든 구성에서 수용률 지표의 일관된 개선을 입증했습니다. 일반 영역, 코딩, 수학 영역에서 접근법을 평가한 결과, 평균 수용 길이에서 최대 8-10% 향상을 확인했습니다. LK 손실은 구현이 쉽고 계산 오버헤드가 없으며 기존의 모든 스펙큘레이터 학습 프레임워크에 직접 통합될 수 있어, 기존 드래프트 학습 목적함수에 대한 강력한 대안이 됩니다.

English

Speculative decoding accelerates autoregressive large language model (LLM) inference by using a lightweight draft model to propose candidate tokens that are then verified in parallel by the target model. The speedup is significantly determined by the acceptance rate, yet standard training minimizes Kullback-Leibler (KL) divergence as a proxy objective. While KL divergence and acceptance rate share the same global optimum, small draft models, having limited capacity, typically converge to suboptimal solutions where minimizing KL does not guarantee maximizing acceptance rate. To address this issue, we propose LK losses, special training objectives that directly target acceptance rate. Comprehensive experiments across four draft architectures and six target models, ranging from 8B to 685B parameters, demonstrate consistent improvements in acceptance metrics across all configurations compared to the standard KL-based training. We evaluate our approach on general, coding and math domains and report gains of up to 8-10% in average acceptance length. LK losses are easy to implement, introduce no computational overhead and can be directly integrated into any existing speculator training framework, making them a compelling alternative to the existing draft training objectives.

LK 손실: 스페큘레이티브 디코딩을 위한 직접 수용률 최적화

LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding

초록

Support