Marco-o1: オープンエンドソリューション向けのオープンリーズニングモデルに向けて

要旨

現在、OpenAIのo1は大規模推論モデル（LRM）の研究に大きな関心を呼び起こしています。この勢いを活かし、Marco-o1は数学、物理学、およびコーディングなどの標準的な答えが存在する学問に焦点を当てるだけでなく、強化学習（RL）に適している分野にも注力しています。さらに、オープンエンドの解決策にも重点を置いています。私たちの目標は、「o1モデルが明確な基準がなく報酬を定量化するのが難しいような広範な領域に効果的に汎化できるか」という問いに取り組むことです。Marco-o1はChain-of-Thought（CoT）のファインチューニング、モンテカルロ木探索（MCTS）、反射メカニズム、革新的な推論戦略によって駆動されており、複雑な現実世界の問題解決タスクに最適化されています。

English

Currently OpenAI o1 has sparked a surge of interest in the study of large reasoning models (LRM). Building on this momentum, Marco-o1 not only focuses on disciplines with standard answers, such as mathematics, physics, and coding -- which are well-suited for reinforcement learning (RL) -- but also places greater emphasis on open-ended resolutions. We aim to address the question: "Can the o1 model effectively generalize to broader domains where clear standards are absent and rewards are challenging to quantify?" Marco-o1 is powered by Chain-of-Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS), reflection mechanisms, and innovative reasoning strategies -- optimized for complex real-world problem-solving tasks.

Marco-o1: オープンエンドソリューション向けのオープンリーズニングモデルに向けて

Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

要旨

Support