CWM: 코드 생성 및 월드 모델 연구를 위한 오픈 웨이트 대형 언어 모델

초록

우리는 코드 생성 연구를 세계 모델(World Model)과 함께 발전시키기 위해 320억 개의 파라미터를 가진 오픈 가중치 대형 언어 모델(LLM)인 Code World Model(CWM)을 공개합니다. 정적 코드만으로 학습하는 것을 넘어 코드 이해를 개선하기 위해, 우리는 CWM을 Python 인터프리터와 에이전트 기반 Docker 환경에서 수집된 대량의 관찰-행동 궤적 데이터로 미드 트레이닝(mid-train)하고, 검증 가능한 코딩, 수학, 다중 턴 소프트웨어 엔지니어링 환경에서 광범위한 다중 작업 추론 강화 학습(RL)을 수행했습니다. CWM은 연구자들이 계산 환경에서 추론과 계획을 통해 코드 생성을 개선하기 위해 세계 모델링이 제공하는 기회를 탐구할 수 있는 강력한 테스트베드를 제공합니다. 우리는 세계 모델이 에이전트 기반 코딩에 어떻게 도움을 줄 수 있는지, Python 코드 실행을 단계별로 시뮬레이션하는 방법을 가능하게 하는지, 그리고 이러한 시뮬레이션이 추론에 어떻게 이점을 제공할 수 있는지에 대한 초기 결과를 제시합니다. CWM은 최대 131,000 토큰의 컨텍스트 크기로 학습된 밀집(dense) 디코더 전용 LLM입니다. 세계 모델링 능력과 별개로, CWM은 일반적인 코딩 및 수학 작업에서 강력한 성능을 보입니다: SWE-bench Verified에서 테스트 시간 스케일링을 적용한 pass@1 점수 65.8%, LiveCodeBench에서 68.6%, Math-500에서 96.6%, AIME 2024에서 76.0%를 달성했습니다. 코드 세계 모델링에 대한 추가 연구를 지원하기 위해, 우리는 미드 트레이닝, SFT(Supervised Fine-Tuning), RL 이후의 모델 체크포인트를 공개합니다.

English

We release Code World Model (CWM), a 32-billion-parameter open-weights LLM, to advance research on code generation with world models. To improve code understanding beyond what can be learned from training on static code alone, we mid-train CWM on a large amount of observation-action trajectories from Python interpreter and agentic Docker environments, and perform extensive multi-task reasoning RL in verifiable coding, math, and multi-turn software engineering environments. With CWM, we provide a strong testbed for researchers to explore the opportunities world modeling affords for improving code generation with reasoning and planning in computational environments. We present first steps of how world models can benefit agentic coding, enable step-by-step simulation of Python code execution, and show early results of how reasoning can benefit from the latter. CWM is a dense, decoder-only LLM trained with a context size of up to 131k tokens. Independent of its world modeling capabilities, CWM offers strong performance on general coding and math tasks: it reaches pass@1 scores of 65.8% on SWE-bench Verified (with test-time scaling), 68.6% on LiveCodeBench, 96.6% on Math-500, and 76.0% on AIME 2024. To support further research on code world modeling, we release model checkpoints after mid-training, SFT, and RL.

CWM: 코드 생성 및 월드 모델 연구를 위한 오픈 웨이트 대형 언어 모델

CWM: An Open-Weights LLM for Research on Code Generation with World Models

초록

Support