O1-Pruner:用於 O1 類推修剪的長度協調微調
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
January 22, 2025
作者: Haotian Luo, Li Shen, Haiying He, Yibo Wang, Shiwei Liu, Wei Li, Naiqiang Tan, Xiaochun Cao, Dacheng Tao
cs.AI
摘要
最近,長思考推理LLM,如OpenAI的O1,採用類似人類思考複雜問題的延長推理過程。這種推理範式顯著增強了模型的解決問題能力並取得了令人鼓舞的成果。然而,長思考推理過程導致推理時間大幅增加。一個迫切的挑戰是降低長思考LLM的推理開銷,同時確保準確性。在本文中,我們實驗性地證明長思考推理模型在根據問題難度和推理冗餘性有效分配標記預算方面存在困難。為了解決這個問題,我們提出了長度協調微調(O1-Pruner),旨在最小化推理開銷同時保持準確性。這種有效的微調方法首先通過預抽樣估計LLM的基準性能,然後使用RL風格的微調來鼓勵模型在準確性約束下生成更短的推理過程。這使得模型能夠以更低的冗餘性實現高效的推理,同時保持準確性。在各種數學推理基準測試中的實驗表明,O1-Pruner不僅顯著降低了推理開銷,還實現了更高的準確性,為這一挑戰提供了一個新穎且有前景的解決方案。我們的代碼即將在https://github.com/StarDewXXX/O1-Pruner 上發布。
English
Recently, long-thought reasoning LLMs, such as OpenAI's O1, adopt extended
reasoning processes similar to how humans ponder over complex problems. This
reasoning paradigm significantly enhances the model's problem-solving abilities
and has achieved promising results. However, long-thought reasoning process
leads to a substantial increase in inference time. A pressing challenge is
reducing the inference overhead of long-thought LLMs while ensuring accuracy.
In this paper, we experimentally demonstrate that long-thought reasoning models
struggle to effectively allocate token budgets based on problem difficulty and
reasoning redundancies. To address this, we propose Length-Harmonizing
Fine-Tuning (O1-Pruner), aiming at minimizing reasoning overhead while
maintaining accuracy. This effective fine-tuning method first estimates the
LLM's baseline performance through pre-sampling and then uses RL-style
fine-tuning to encourage the model to generate shorter reasoning processes
under accuracy constraints. This allows the model to achieve efficient
reasoning with lower redundancy while maintaining accuracy. Experiments on
various mathematical reasoning benchmarks show that O1-Pruner not only
significantly reduces inference overhead but also achieves higher accuracy,
providing a novel and promising solution to this challenge. Our code is coming
soon at https://github.com/StarDewXXX/O1-PrunerSummary
AI-Generated Summary