INTELLECT-3: 기술 보고서

초록

저희는 종단간 RL 인프라 스택을 통해 대규모 강화학습으로 학습된 106B 파라미터 MoE(전문가 혼합) 모델(활성 파라미터 12B)인 INTELLECT-3를 소개합니다. INTELLECT-3는 수학, 코드, 과학 및 추론 벤치마크에서 동급 규모 기준 최고 성능을 달성하며, 많은 대형 최첨단 모델들을 능가합니다. 본 모델과 함께 이를 생성하는 데 사용된 전체 인프라 스택(RL 프레임워크, 완전한 레시피, 그리고 Environments Hub 커뮤니티 플랫폴을 통해 verifiers 라이브러리로 구축된 광범위한 훈련 및 평가 환경 컬렉션)을 오픈소스로 공개합니다. 이번 연구를 위해 저희는 단일 노드에서 수천 개의 GPU까지 원활하게 확장되며, 다중 턴 상호작용과 도구 사용을 일급 지원하여 에이전트 RL에 맞춤화된 대규모 비동기 강화학습용 오픈 프레임워크인 prime-rl을 도입했습니다. 이 스택을 활용하여 GLM-4.5-Air-Base 모델 기반으로 SFT와 RL 훈련을 모두 수행하며, 높은 훈련 효율로 RL 훈련을 최대 512개의 H200 GPU까지 확장했습니다.

English

We present INTELLECT-3, a 106B-parameter Mixture-of-Experts model (12B active) trained with large-scale reinforcement learning on our end-to-end RL infrastructure stack. INTELLECT-3 achieves state of the art performance for its size across math, code, science and reasoning benchmarks, outperforming many larger frontier models. We open-source the model together with the full infrastructure stack used to create it, including RL frameworks, complete recipe, and a wide collection of environments, built with the verifiers library, for training and evaluation from our Environments Hub community platform. Built for this effort, we introduce prime-rl, an open framework for large-scale asynchronous reinforcement learning, which scales seamlessly from a single node to thousands of GPUs, and is tailored for agentic RL with first-class support for multi-turn interactions and tool use. Using this stack, we run both SFT and RL training on top of the GLM-4.5-Air-Base model, scaling RL training up to 512 H200s with high training efficiency.

INTELLECT-3: 기술 보고서

INTELLECT-3: Technical Report

초록

Support