MiniMax-M1：利用闪电注意力高效扩展测试时计算

摘要

我们推出MiniMax-M1，这是全球首个开放权重的大规模混合注意力推理模型。MiniMax-M1采用混合专家系统（MoE）架构，结合闪电注意力机制驱动。该模型基于我们之前的MiniMax-Text-01模型开发，后者包含总计4560亿参数，每令牌激活459亿参数。M1模型原生支持100万令牌的上下文长度，是DeepSeek R1上下文大小的8倍。此外，MiniMax-M1中的闪电注意力机制实现了测试时计算的高效扩展。这些特性使M1特别适合需要处理长输入和深度思考的复杂任务。MiniMax-M1通过大规模强化学习（RL）在包括沙盒环境和真实世界软件工程问题在内的多样化任务上进行训练。除了M1在RL训练中固有的效率优势外，我们提出了CISPO，一种新颖的RL算法，以进一步提升RL效率。CISPO通过裁剪重要性采样权重而非令牌更新，超越了其他竞争性RL变体。结合混合注意力与CISPO，MiniMax-M1在512台H800 GPU上的完整RL训练仅需三周完成，租赁成本仅为534,700美元。我们发布了两个版本的MiniMax-M1模型，分别具有40K和80K的思考预算，其中40K模型代表80K训练的中期阶段。在标准基准测试上的实验表明，我们的模型与DeepSeek-R1和Qwen3-235B等强开放权重模型相当或更优，尤其在复杂软件工程、工具使用和长上下文任务中表现出色。我们已在https://github.com/MiniMax-AI/MiniMax-M1公开MiniMax-M1。

English

We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning attention mechanism. The model is developed based on our previous MiniMax-Text-01 model, which contains a total of 456 billion parameters with 45.9 billion parameters activated per token. The M1 model natively supports a context length of 1 million tokens, 8x the context size of DeepSeek R1. Furthermore, the lightning attention mechanism in MiniMax-M1 enables efficient scaling of test-time compute. These properties make M1 particularly suitable for complex tasks that require processing long inputs and thinking extensively. MiniMax-M1 is trained using large-scale reinforcement learning (RL) on diverse problems including sandbox-based, real-world software engineering environments. In addition to M1's inherent efficiency advantage for RL training, we propose CISPO, a novel RL algorithm to further enhance RL efficiency. CISPO clips importance sampling weights rather than token updates, outperforming other competitive RL variants. Combining hybrid-attention and CISPO enables MiniMax-M1's full RL training on 512 H800 GPUs to complete in only three weeks, with a rental cost of just $534,700. We release two versions of MiniMax-M1 models with 40K and 80K thinking budgets respectively, where the 40K model represents an intermediate phase of the 80K training. Experiments on standard benchmarks show that our models are comparable or superior to strong open-weight models such as the original DeepSeek-R1 and Qwen3-235B, with particular strengths in complex software engineering, tool utilization, and long-context tasks. We publicly release MiniMax-M1 at https://github.com/MiniMax-AI/MiniMax-M1.

MiniMax-M1：利用闪电注意力高效扩展测试时计算

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

摘要

Support