步驟3.5 Flash：以110億活躍參數開啟前沿級智能

摘要

我們推出 Step 3.5 Flash——一個稀疏專家混合模型，在頂級智能體能力與計算效率之間架起橋樑。我們聚焦於構建智能體最關鍵的要素：敏銳的推理能力與快速可靠的執行效能。該模型以1960億參數為基礎，僅激活110億參數實現高效推理，並採用3:1交錯滑動窗口/全局注意力機制與多標記預測技術（MTP-3）優化，顯著降低多輪智能體交互的延遲與成本。為實現頂級智能，我們設計了可擴展的強化學習框架，將可驗證信號與偏好反饋相結合，在大規模離線策略訓練中保持穩定性，使模型能在數學、編程與工具運用領域持續自我進化。Step 3.5 Flash 在智能體、編程與數學任務中表現卓越：IMO-AnswerBench 達85.4%、LiveCodeBench-v6（2024.08-2025.05）獲86.4%、tau2-Bench 取得88.2%、BrowseComp（含上下文管理）達69.0%、Terminal-Bench 2.0 獲51.0%，性能可比肩 GPT-5.2 xHigh 與 Gemini 3.0 Pro 等前沿模型。通過重新定義效率邊界，Step 3.5 Flash 為在真實工業環境中部署複雜智能體提供了高密度基礎架構。

English

We introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency. We focus on what matters most when building agents: sharp reasoning and fast, reliable execution. Step 3.5 Flash pairs a 196B-parameter foundation with 11B active parameters for efficient inference. It is optimized with interleaved 3:1 sliding-window/full attention and Multi-Token Prediction (MTP-3) to reduce the latency and cost of multi-round agentic interactions. To reach frontier-level intelligence, we design a scalable reinforcement learning framework that combines verifiable signals with preference feedback, while remaining stable under large-scale off-policy training, enabling consistent self-improvement across mathematics, code, and tool use. Step 3.5 Flash demonstrates strong performance across agent, coding, and math tasks, achieving 85.4% on IMO-AnswerBench, 86.4% on LiveCodeBench-v6 (2024.08-2025.05), 88.2% on tau2-Bench, 69.0% on BrowseComp (with context management), and 51.0% on Terminal-Bench 2.0, comparable to frontier models such as GPT-5.2 xHigh and Gemini 3.0 Pro. By redefining the efficiency frontier, Step 3.5 Flash provides a high-density foundation for deploying sophisticated agents in real-world industrial environments.

步驟3.5 Flash：以110億活躍參數開啟前沿級智能

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

摘要

Support