InCoder-32B：面向工業場景的程式碼基礎模型

摘要

近期程式碼大型語言模型在通用程式設計任務上取得了顯著進展。然而，在需要理解硬體語義、特殊語言結構與嚴格資源限制的工業場景中，其性能會急遽下降。為應對這些挑戰，我們推出InCoder-32B（工業級程式碼生成器-32B），首個320億參數的程式碼基礎模型，統一整合了晶片設計、GPU核心優化、嵌入式系統、編譯器優化及3D建模等領域的程式碼智能。通過採用高效架構，我們從零開始訓練InCoder-32B，包含通用程式碼預訓練、精選工業程式碼退火處理、中期訓練（使用合成工業推理數據將上下文長度從8K逐步擴展至128K符元），以及基於執行驗證的後期訓練。我們在14個主流通用程式碼基準測試與橫跨4大專業領域的9個工業基準測試上進行廣泛評估。結果顯示InCoder-32B在通用任務中表現極具競爭力，同時在工業領域建立了強大的開源基準線。

English

Recent code large language models have achieved remarkable progress on general programming tasks. Nevertheless, their performance degrades significantly in industrial scenarios that require reasoning about hardware semantics, specialized language constructs, and strict resource constraints. To address these challenges, we introduce InCoder-32B (Industrial-Coder-32B), the first 32B-parameter code foundation model unifying code intelligence across chip design, GPU kernel optimization, embedded systems, compiler optimization, and 3D modeling. By adopting an efficient architecture, we train InCoder-32B from scratch with general code pre-training, curated industrial code annealing, mid-training that progressively extends context from 8K to 128K tokens with synthetic industrial reasoning data, and post-training with execution-grounded verification. We conduct extensive evaluation on 14 mainstream general code benchmarks and 9 industrial benchmarks spanning 4 specialized domains. Results show InCoder-32B achieves highly competitive performance on general tasks while establishing strong open-source baselines across industrial domains.

InCoder-32B：面向工業場景的程式碼基礎模型

InCoder-32B: Code Foundation Model for Industrial Scenarios

摘要

Support