DeepSeek-V3.2: 오픈 대규모 언어 모델의 최전선을 확장하다

초록

DeepSeek-V3.2는 높은 계산 효율성과 우수한 추론 및 에이전트 성능을 조화시킨 모델입니다. DeepSeek-V3.2의 주요 기술적 돌파구는 다음과 같습니다: (1) DeepSeek Sparse Attention(DSA): 긴 문맥 시나리오에서 모델 성능을 유지하면서 계산 복잡성을 크게 줄이는 효율적인 어텐션 메커니즘인 DSA를 도입했습니다. (2) 확장 가능한 강화 학습 프레임워크: 강력한 강화 학습 프로토콜을 구현하고 사후 훈련 계산을 확장함으로써 DeepSeek-V3.2는 GPT-5에 버금가는 성능을 보입니다. 특히 고사양 변형 모델인 DeepSeek-V3.2-Speciale은 GPT-5를 능가하며 Gemini-3.0-Pro와 동등한 추론 능력을 보여주어 2025년 국제 수학 올림피아드(IMO)와 국제 정보 올림피아드(IOI)에서 금메달 성적을 달성했습니다. (3) 대규모 에이전트 작업 합성 파이프라인: 추론을 도구 사용 시나리오에 통합하기 위해 체계적으로 대규모 훈련 데이터를 생성하는 새로운 합성 파이프라인을 개발했습니다. 이 방법론은 확장 가능한 에이전트 사후 훈련을 용이하게 하여 복잡한 상호작용 환경 내에서 일반화 및 지시 따르기 강건성을 크게 향상시킵니다.

English

We introduce DeepSeek-V3.2, a model that harmonizes high computational efficiency with superior reasoning and agent performance. The key technical breakthroughs of DeepSeek-V3.2 are as follows: (1) DeepSeek Sparse Attention (DSA): We introduce DSA, an efficient attention mechanism that substantially reduces computational complexity while preserving model performance in long-context scenarios. (2) Scalable Reinforcement Learning Framework: By implementing a robust reinforcement learning protocol and scaling post-training compute, DeepSeek-V3.2 performs comparably to GPT-5. Notably, our high-compute variant, DeepSeek-V3.2-Speciale, surpasses GPT-5 and exhibits reasoning proficiency on par with Gemini-3.0-Pro, achieving gold-medal performance in both the 2025 International Mathematical Olympiad (IMO) and the International Olympiad in Informatics (IOI). (3) Large-Scale Agentic Task Synthesis Pipeline: To integrate reasoning into tool-use scenarios, we developed a novel synthesis pipeline that systematically generates training data at scale. This methodology facilitates scalable agentic post-training, yielding substantial improvements in generalization and instruction-following robustness within complex, interactive environments.

DeepSeek-V3.2: 오픈 대규모 언어 모델의 최전선을 확장하다

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

초록

Support