MiniMax-M2 시리즈: 미니 활성화가 끌어내는 최대 실제 세계 지능

초록

MiniMax-M2 시리즈를 소개합니다. 이는 소규모 활성화가 실제 세계의 최대 지능을 발휘할 수 있다는 원칙을 바탕으로 구축된 혼합 전문가 언어 모델 제품군입니다. 주력 모델인 M2는 총 2,299억 개의 파라미터를 보유하며, 토큰당 98억 개만 활성화됩니다. 에이전트 배포를 위해 종단 간 설계된 M2 시리즈는 세 가지 구성 요소로 구성됩니다: (i) 각각 실행 가능한 작업 공간과 산출물 정렬 보상에 기반한 에이전트 코딩 및 에이전트 협업 전반에 걸쳐 대규모의 검증 가능한 궤적을 생성하는 에이전트 중심 데이터 파이프라인; (ii) 장기적 에이전트 궤적에 적응하는 확장 가능한 에이전트 네이티브 강화학습 시스템인 Forge와 윈도우 FIFO 스케줄링, 접두사 트리 병합, 추론 최적화, 그리고 화이트박스 및 블랙박스 에이전트를 모두 지원하는 깔끔한 훈련-추론-에이전트 분리; (iii) 최신 M2.7 체크포인트는 훈련 실행을 자율적으로 디버깅하고 자체 스캐폴드를 수정하는 초기 자기 진화 단계를 구현합니다. M2부터 M2.7까지 이러한 조합은 소규모 활성화 풋프린트를 에이전트 코딩, 심층 검색, 사무 작업, 추론 벤치마크에서 최첨단 성능으로 전환합니다.

English

We introduce the MiniMax-M2 series, a family of Mixture-of-Experts language models built around the principle that mini activations can unleash maximum real-world intelligence. The flagship M2 contains 229.9B total parameters with only 9.8B activated per token. Designed end-to-end for agentic deployment, the M2 series rests on three components: (i) agent-driven data pipelines producing large-scale, verifiable trajectories across agentic coding and agentic cowork, each grounded in an executable workspace and an artifact-aligned reward; (ii) Forge, a scalable agent-native RL system that adapts to long-horizon agent trajectories, paired with windowed-FIFO scheduling, prefix-tree merging, inference optimization, and a clean training-inference-agent decoupling that supports both white-box and black-box agents; (iii) the latest M2.7 checkpoint takes an early step toward self-evolution -- autonomously debugging training runs and modifying its own scaffold. Across M2 through M2.7, this combination translates a mini-activation footprint into frontier-tier performance on agentic coding, deep search, office-task, and reasoning benchmarks.