通过协作逐步多教师解码蒸馏长思维链推理

摘要

蒸馏大型推理模型对于实现长链思维推理的实际应用至关重要，因为全规模推理的计算成本仍然过高。现有的基于筛选的方法事后选择完整的推理痕迹，忽略了异构教师模型之间的协作，且缺乏动态探索，从而导致冗余采样和互补推理的遗漏。我们提出CoRD，一种协作式多教师解码框架，通过基于预测困惑度的评分和束搜索执行逐步推理合成。这使得异构大型推理模型能够共同构建连贯的推理轨迹，同时有效保留多样且高潜力的假设。实验表明，CoRD生成更高质量的推理数据，并以更少的结构化监督信号实现接近教师水平的学生性能，且无需显著增加效率开销。CoRD在域外和开放式场景中也具有良好的泛化能力。数据集和模型可在https://github.com/DISL-Lab/CoRD获取。

English

Distilling large reasoning models is essential for making Long-CoT reasoning practical, as full-scale inference remains computationally prohibitive. Existing curation-based approaches select complete reasoning traces post-hoc, overlooking collaboration among heterogeneous teachers and lacking dynamic exploration, which leads to redundant sampling and missed complementary reasoning. We introduce CoRD, a collaborative multi-teacher decoding framework that performs step-wise reasoning synthesis guided by predictive perplexity-based scoring and beam search. This enables heterogeneous LRMs to jointly construct coherent reasoning trajectories while efficiently preserving diverse, high-potential hypotheses. Experiments show that CoRD produces higher-quality reasoning data and achieves near teacher-level student performance with fewer, structured supervision signals, without substantial efficiency overhead. CoRD further generalizes well to out-of-domain and open-ended settings. The dataset and model are available at https://github.com/DISL-Lab/CoRD{https://github.com/DISL-Lab/CoRD}.