InCoder-32B：産業向けコード基盤モデル

要旨

近年、コード大規模言語モデルは一般的なプログラミングタスクにおいて顕著な進歩を遂げています。しかしながら、ハードウェアセマンティクスの推論、特殊な言語構文、厳格なリソース制約を必要とする産業シナリオでは、その性能が大幅に低下します。これらの課題に対処するため、我々はInCoder-32B（Industrial-Coder-32B）を導入します。これは、チップ設計、GPUカーネル最適化、組み込みシステム、コンパイラ最適化、3Dモデリングにわたるコード知能を統合した初の320億パラメータコード基盤モデルです。効率的なアーキテクチャを採用し、InCoder-32Bをスクラッチから、一般コード事前学習、精選された産業コードアニーリング、合成産業推論データを用いてコンテキストを8Kトークンから128Kトークンへ段階的に拡張する中期訓練、実行に基づく検証による事後訓練によって学習させます。14の主流な一般コードベンチマークと4つの専門領域にまたがる9つの産業ベンチマークで広範な評価を実施しました。結果は、InCoder-32Bが一般的なタスクで高い競争力を発揮すると同時に、産業領域全体で強力なオープンソースベースラインを確立することを示しています。

English

Recent code large language models have achieved remarkable progress on general programming tasks. Nevertheless, their performance degrades significantly in industrial scenarios that require reasoning about hardware semantics, specialized language constructs, and strict resource constraints. To address these challenges, we introduce InCoder-32B (Industrial-Coder-32B), the first 32B-parameter code foundation model unifying code intelligence across chip design, GPU kernel optimization, embedded systems, compiler optimization, and 3D modeling. By adopting an efficient architecture, we train InCoder-32B from scratch with general code pre-training, curated industrial code annealing, mid-training that progressively extends context from 8K to 128K tokens with synthetic industrial reasoning data, and post-training with execution-grounded verification. We conduct extensive evaluation on 14 mainstream general code benchmarks and 9 industrial benchmarks spanning 4 specialized domains. Results show InCoder-32B achieves highly competitive performance on general tasks while establishing strong open-source baselines across industrial domains.

InCoder-32B：産業向けコード基盤モデル

InCoder-32B: Code Foundation Model for Industrial Scenarios

要旨

Support