透過協作逐步多教師解碼蒸餾長思考鏈推理
Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding
May 4, 2026
作者: Taewon Yun, Jisu Shin, Jeonghwan Choi, Seunghwan Bang, Hwanjun Song
cs.AI
摘要
蒸馏大型推理模型对于使长链思维推理变得实用至关重要,因为全规模推理在计算上仍不可行。现有的基于筛选的方法事后选择完整的推理痕迹,忽视了异构教师之间的协作,且缺乏动态探索,导致冗余采样并错失互补推理。我们提出CoRD,一种协同多教师解码框架,通过基于预测困惑度评分与束搜索进行逐步推理合成。这使得异构长链推理模型能够联合构建连贯的推理轨迹,同时高效保留多样化、高潜力的假设。实验表明,CoRD能生成更高质量的推理数据,并以更少、结构化的监督信号实现接近教师水平的学生性能,而不会带来显著的计算开销。此外,CoRD在领域外及开放设定下具有良好的泛化能力。数据集和模型可在https://github.com/DISL-Lab/CoRD获取。
English
Distilling large reasoning models is essential for making Long-CoT reasoning practical, as full-scale inference remains computationally prohibitive. Existing curation-based approaches select complete reasoning traces post-hoc, overlooking collaboration among heterogeneous teachers and lacking dynamic exploration, which leads to redundant sampling and missed complementary reasoning. We introduce CoRD, a collaborative multi-teacher decoding framework that performs step-wise reasoning synthesis guided by predictive perplexity-based scoring and beam search. This enables heterogeneous LRMs to jointly construct coherent reasoning trajectories while efficiently preserving diverse, high-potential hypotheses. Experiments show that CoRD produces higher-quality reasoning data and achieves near teacher-level student performance with fewer, structured supervision signals, without substantial efficiency overhead. CoRD further generalizes well to out-of-domain and open-ended settings. The dataset and model are available at https://github.com/DISL-Lab/CoRD{https://github.com/DISL-Lab/CoRD}.