O1-Pruner：用於 O1 類推修剪的長度協調微調

摘要

最近，長思考推理LLM，如OpenAI的O1，採用類似人類思考複雜問題的延長推理過程。這種推理範式顯著增強了模型的解決問題能力並取得了令人鼓舞的成果。然而，長思考推理過程導致推理時間大幅增加。一個迫切的挑戰是降低長思考LLM的推理開銷，同時確保準確性。在本文中，我們實驗性地證明長思考推理模型在根據問題難度和推理冗餘性有效分配標記預算方面存在困難。為了解決這個問題，我們提出了長度協調微調（O1-Pruner），旨在最小化推理開銷同時保持準確性。這種有效的微調方法首先通過預抽樣估計LLM的基準性能，然後使用RL風格的微調來鼓勵模型在準確性約束下生成更短的推理過程。這使得模型能夠以更低的冗餘性實現高效的推理，同時保持準確性。在各種數學推理基準測試中的實驗表明，O1-Pruner不僅顯著降低了推理開銷，還實現了更高的準確性，為這一挑戰提供了一個新穎且有前景的解決方案。我們的代碼即將在https://github.com/StarDewXXX/O1-Pruner 上發布。

English

Recently, long-thought reasoning LLMs, such as OpenAI's O1, adopt extended reasoning processes similar to how humans ponder over complex problems. This reasoning paradigm significantly enhances the model's problem-solving abilities and has achieved promising results. However, long-thought reasoning process leads to a substantial increase in inference time. A pressing challenge is reducing the inference overhead of long-thought LLMs while ensuring accuracy. In this paper, we experimentally demonstrate that long-thought reasoning models struggle to effectively allocate token budgets based on problem difficulty and reasoning redundancies. To address this, we propose Length-Harmonizing Fine-Tuning (O1-Pruner), aiming at minimizing reasoning overhead while maintaining accuracy. This effective fine-tuning method first estimates the LLM's baseline performance through pre-sampling and then uses RL-style fine-tuning to encourage the model to generate shorter reasoning processes under accuracy constraints. This allows the model to achieve efficient reasoning with lower redundancy while maintaining accuracy. Experiments on various mathematical reasoning benchmarks show that O1-Pruner not only significantly reduces inference overhead but also achieves higher accuracy, providing a novel and promising solution to this challenge. Our code is coming soon at https://github.com/StarDewXXX/O1-Pruner

O1-Pruner：用於 O1 類推修剪的長度協調微調

O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning

摘要

Support