R2R：通过小-大模型令牌路由高效导航分歧推理路径

摘要

大型语言模型（LLMs）在展现卓越推理能力的同时，也伴随着显著的推理开销，这给实际部署带来了巨大挑战。尽管蒸馏后的小型语言模型（SLMs）大幅提升了效率，但其性能却因无法遵循LLMs的推理路径而受到影响。幸运的是，我们发现，LLMs与SLMs之间真正导致推理路径分歧的仅是少数关键token，大多数生成的token要么完全相同，要么仅存在中性差异，如缩写或表达上的细微变化。基于这一洞察，我们提出了**罗马之路（R2R）**，一种神经token路由方法，它仅在处理这些关键的、路径分歧的token时选择性调用LLMs，而将大部分token生成任务交由SLM完成。我们还开发了一个自动数据生成管道，用于识别分歧token并生成token级别的路由标签，以训练轻量级路由器。我们将R2R应用于DeepSeek家族的R1-1.5B与R1-32B模型，并在数学、编程和问答等挑战性基准上进行了评估。在平均激活参数量为5.6B的情况下，R2R以1.6倍的优势超越了R1-7B的平均准确率，甚至优于R1-14B模型。与R1-32B相比，它在保持相当性能的同时，实现了2.8倍的实时加速，推动了测试时扩展效率的帕累托前沿。我们的代码已发布于https://github.com/thu-nics/R2R。

English

Large Language Models (LLMs) achieve impressive reasoning capabilities at the cost of substantial inference overhead, posing substantial deployment challenges. Although distilled Small Language Models (SLMs) significantly enhance efficiency, their performance suffers as they fail to follow LLMs' reasoning paths. Luckily, we reveal that only a small fraction of tokens genuinely diverge reasoning paths between LLMs and SLMs. Most generated tokens are either identical or exhibit neutral differences, such as minor variations in abbreviations or expressions. Leveraging this insight, we introduce **Roads to Rome (R2R)**, a neural token routing method that selectively utilizes LLMs only for these critical, path-divergent tokens, while leaving the majority of token generation to the SLM. We also develop an automatic data generation pipeline that identifies divergent tokens and generates token-level routing labels to train the lightweight router. We apply R2R to combine R1-1.5B and R1-32B models from the DeepSeek family, and evaluate on challenging math, coding, and QA benchmarks. With an average activated parameter size of 5.6B, R2R surpasses the average accuracy of R1-7B by 1.6x, outperforming even the R1-14B model. Compared to R1-32B, it delivers a 2.8x wall-clock speedup with comparable performance, advancing the Pareto frontier of test-time scaling efficiency. Our code is available at https://github.com/thu-nics/R2R.

R2R：通过小-大模型令牌路由高效导航分歧推理路径

R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing

摘要

Support