O1-Pruner: O1ライクな推論の剪定のための長さ調整ファインチューニング

要旨

最近、長期推論LLM（例：OpenAIのO1）は、人間が複雑な問題を考える方法に類似した拡張推論プロセスを採用しています。この推論パラダイムは、モデルの問題解決能力を大幅に向上させ、有望な結果を達成しています。ただし、長期推論プロセスは推論時間の著しい増加をもたらします。長期推論LLMの推論オーバーヘッドを削減することは、精度を確保しつつも重要な課題です。本論文では、長期推論モデルが問題の難易度や推論の冗長性に基づいてトークン予算を効果的に割り当てることに苦労していることを実験的に示します。この課題に対処するために、推論オーバーヘッドを最小限に抑えつつも精度を維持することを目的としたLength-Harmonizing Fine-Tuning（O1-Pruner）を提案します。この効果的なファインチューニング手法は、まずLLMのベースライン性能を事前サンプリングによって推定し、その後、RLスタイルのファインチューニングを使用して、モデルに精度制約下でより短い推論プロセスを生成するよう促します。これにより、モデルは効率的な推論を達成し、冗長性を低く抑えつつも精度を維持できます。さまざまな数学的推論ベンチマークでの実験結果は、O1-Prunerが推論オーバーヘッドを大幅に削減するだけでなく、より高い精度を達成しており、この課題への新しい有望な解決策を提供しています。私たちのコードは近日中にhttps://github.com/StarDewXXX/O1-Pruner に公開されます。

English

Recently, long-thought reasoning LLMs, such as OpenAI's O1, adopt extended reasoning processes similar to how humans ponder over complex problems. This reasoning paradigm significantly enhances the model's problem-solving abilities and has achieved promising results. However, long-thought reasoning process leads to a substantial increase in inference time. A pressing challenge is reducing the inference overhead of long-thought LLMs while ensuring accuracy. In this paper, we experimentally demonstrate that long-thought reasoning models struggle to effectively allocate token budgets based on problem difficulty and reasoning redundancies. To address this, we propose Length-Harmonizing Fine-Tuning (O1-Pruner), aiming at minimizing reasoning overhead while maintaining accuracy. This effective fine-tuning method first estimates the LLM's baseline performance through pre-sampling and then uses RL-style fine-tuning to encourage the model to generate shorter reasoning processes under accuracy constraints. This allows the model to achieve efficient reasoning with lower redundancy while maintaining accuracy. Experiments on various mathematical reasoning benchmarks show that O1-Pruner not only significantly reduces inference overhead but also achieves higher accuracy, providing a novel and promising solution to this challenge. Our code is coming soon at https://github.com/StarDewXXX/O1-Pruner

O1-Pruner: O1ライクな推論の剪定のための長さ調整ファインチューニング

O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning

要旨

Support