MiniMax-M2シリーズ：ミニアクティベーションが解き放つ最大の現実世界の知能

要旨

本稿では、最小限の活性化で最大限の実世界知能を引き出すという原則に基づいて設計された、Mixture-of-Experts言語モデル群であるMiniMax-M2シリーズを紹介する。フラグシップモデルであるM2は総パラメータ数229.9Bであり、トークンあたりの活性化パラメータはわずか9.8Bである。エージェント展開をエンドツーエンドで想定して設計されたM2シリーズは、以下の3つのコンポーネントに基づいている。(i)エージェント駆動型データパイプライン。これは、エージェント型コーディングおよびエージェント型コワークにおける大規模で検証可能な軌跡を生成し、それぞれ実行可能なワークスペースとアーティファクトに整合した報酬に基づく。(ii) Forge。スケーラブルなエージェントネイティブ強化学習システムであり、長期的なエージェント軌跡に適応し、ウィンドウ化FIFOスケジューリング、プレフィックスツリーマージ、推論最適化、およびホワイトボックスエージェントとブラックボックスエージェントの両方をサポートするクリーンな学習-推論-エージェントの分離を備える。(iii)最新のM2.7チェックポイントは、自己進化への初期段階を示すものであり、自律的に学習実行をデバッグし、自身のスキャフォールドを修正する。M2からM2.7に至るまで、この組み合わせにより、少ない活性化パラメータという特徴を活かしながら、エージェント型コーディング、深層探索、オフィスタスク、および推論ベンチマークにおいて最先端レベルの性能を実現する。

English

We introduce the MiniMax-M2 series, a family of Mixture-of-Experts language models built around the principle that mini activations can unleash maximum real-world intelligence. The flagship M2 contains 229.9B total parameters with only 9.8B activated per token. Designed end-to-end for agentic deployment, the M2 series rests on three components: (i) agent-driven data pipelines producing large-scale, verifiable trajectories across agentic coding and agentic cowork, each grounded in an executable workspace and an artifact-aligned reward; (ii) Forge, a scalable agent-native RL system that adapts to long-horizon agent trajectories, paired with windowed-FIFO scheduling, prefix-tree merging, inference optimization, and a clean training-inference-agent decoupling that supports both white-box and black-box agents; (iii) the latest M2.7 checkpoint takes an early step toward self-evolution -- autonomously debugging training runs and modifying its own scaffold. Across M2 through M2.7, this combination translates a mini-activation footprint into frontier-tier performance on agentic coding, deep search, office-task, and reasoning benchmarks.