第三步闪存：以110亿活跃参数开启前沿级智能

摘要

我们推出Step 3.5 Flash模型——一种稀疏专家混合模型，在顶尖级智能体能力与计算效率之间架设桥梁。我们聚焦于构建智能体最核心的要素：精准的推理能力与快速可靠的执行效能。该模型以1960亿参数为基础架构，通过110亿活跃参数实现高效推理，并采用3:1交错滑动窗口/全局注意力机制与多令牌预测技术优化，显著降低多轮智能体交互的延迟与成本。为达到顶尖智能水平，我们设计了可扩展的强化学习框架，将可验证信号与偏好反馈相结合，在大规模离线训练中保持稳定性，实现数学、代码和工具使用领域的持续自我提升。Step 3.5 Flash在智能体、编程和数学任务中表现卓越：IMO-AnswerBench达85.4%、LiveCodeBench-v6（2024.08-2025.05）获86.4%、tau2-Bench取得88.2%、BrowseComp（含上下文管理）达69.0%、Terminal-Bench 2.0获51.0%，性能可比肩GPT-5.2 xHigh与Gemini 3.0 Pro等顶尖模型。通过重新定义效率边界，Step 3.5 Flash为现实工业环境中部署复杂智能体提供了高密度基础架构。

English

We introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency. We focus on what matters most when building agents: sharp reasoning and fast, reliable execution. Step 3.5 Flash pairs a 196B-parameter foundation with 11B active parameters for efficient inference. It is optimized with interleaved 3:1 sliding-window/full attention and Multi-Token Prediction (MTP-3) to reduce the latency and cost of multi-round agentic interactions. To reach frontier-level intelligence, we design a scalable reinforcement learning framework that combines verifiable signals with preference feedback, while remaining stable under large-scale off-policy training, enabling consistent self-improvement across mathematics, code, and tool use. Step 3.5 Flash demonstrates strong performance across agent, coding, and math tasks, achieving 85.4% on IMO-AnswerBench, 86.4% on LiveCodeBench-v6 (2024.08-2025.05), 88.2% on tau2-Bench, 69.0% on BrowseComp (with context management), and 51.0% on Terminal-Bench 2.0, comparable to frontier models such as GPT-5.2 xHigh and Gemini 3.0 Pro. By redefining the efficiency frontier, Step 3.5 Flash provides a high-density foundation for deploying sophisticated agents in real-world industrial environments.