o1-Coder: コーディングのためのo1複製

要旨

この技術レポートは、コーディングタスクに焦点を当てたOpenAIのo1モデルを再現しようとするO1-CODERを紹介しています。このモデルは、強化学習（RL）とモンテカルロ木探索（MCTS）を統合して、モデルのSystem-2思考能力を向上させています。フレームワークには、標準化されたコードテストのためのテストケースジェネレータ（TCG）のトレーニング、MCTSを使用して推論プロセスを伴うコードデータを生成し、方針モデルを繰り返し微調整して最初に疑似コードを生成し、その後完全なコードを生成するという要素が含まれています。レポートでは、実世界のアプリケーションにo1のようなモデルを展開する際の機会と課題にも言及し、System-2パラダイムへの移行を提案し、環境状態の更新が不可欠であることを強調しています。更新されたモデルの進捗状況や実験結果は、後続バージョンで報告されます。すべてのソースコード、キュレーションされたデータセット、および派生モデルは、https://github.com/ADaM-BJTU/O1-CODER で公開されます。

English

The technical report introduces O1-CODER, an attempt to replicate OpenAI's o1 model with a focus on coding tasks. It integrates reinforcement learning (RL) and Monte Carlo Tree Search (MCTS) to enhance the model's System-2 thinking capabilities. The framework includes training a Test Case Generator (TCG) for standardized code testing, using MCTS to generate code data with reasoning processes, and iteratively fine-tuning the policy model to initially produce pseudocode, followed by the generation of the full code. The report also addresses the opportunities and challenges in deploying o1-like models in real-world applications, suggesting transitioning to the System-2 paradigm and highlighting the imperative for environment state updates. Updated model progress and experimental results will be reported in subsequent versions. All source code, curated datasets, as well as the derived models will be disclosed at https://github.com/ADaM-BJTU/O1-CODER .

o1-Coder: コーディングのためのo1複製

o1-Coder: an o1 Replication for Coding

要旨

Support