AdaR1：從長鏈思維到混合思維——基於雙層自適應推理的優化

摘要

近期，長鏈推理模型在複雜推理任務上展現出強勁性能，但往往伴隨著顯著的推理開銷，使得效率成為關鍵考量。我們的實證分析揭示，使用長鏈推理（Long-CoT）的效益因問題而異：某些問題需要細緻的推理，而另一些則未見提升，甚至準確率下降。這促使我們探索自適應推理策略，根據輸入調整推理深度。然而，先前的研究主要著眼於減少長推理路徑中的冗餘，限制了對超越長鏈推理範式的更高效策略的探索。為此，我們提出了一種新穎的兩階段框架，旨在實現自適應且高效的推理。首先，我們通過融合長鏈與短鏈推理模型構建了一個混合推理模型，以支持多樣化的推理風格。其次，我們應用雙層偏好訓練，引導模型在群組層面選擇合適的推理風格，並在每個風格群組內實例層面偏好簡潔且正確的推理。實驗表明，與其他基線方法相比，我們的方法在保持性能的同時，顯著降低了推理成本。值得注意的是，在五個數學數據集上，推理的平均長度減少了超過50%，凸顯了自適應策略在優化大型語言模型推理效率方面的潛力。我們的代碼即將發佈於https://github.com/StarDewXXX/AdaR1。

English

Recently, long-thought reasoning models achieve strong performance on complex reasoning tasks, but often incur substantial inference overhead, making efficiency a critical concern. Our empirical analysis reveals that the benefit of using Long-CoT varies across problems: while some problems require elaborate reasoning, others show no improvement, or even degraded accuracy. This motivates adaptive reasoning strategies that tailor reasoning depth to the input. However, prior work primarily reduces redundancy within long reasoning paths, limiting exploration of more efficient strategies beyond the Long-CoT paradigm. To address this, we propose a novel two-stage framework for adaptive and efficient reasoning. First, we construct a hybrid reasoning model by merging long and short CoT models to enable diverse reasoning styles. Second, we apply bi-level preference training to guide the model to select suitable reasoning styles (group-level), and prefer concise and correct reasoning within each style group (instance-level). Experiments demonstrate that our method significantly reduces inference costs compared to other baseline approaches, while maintaining performance. Notably, on five mathematical datasets, the average length of reasoning is reduced by more than 50%, highlighting the potential of adaptive strategies to optimize reasoning efficiency in large language models. Our code is coming soon at https://github.com/StarDewXXX/AdaR1

AdaR1：從長鏈思維到混合思維——基於雙層自適應推理的優化

AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization

摘要

Support