AutoTriton: 大規模言語モデルにおける強化学習を用いた自動Tritonプログラミング

要旨

深層学習におけるカーネル開発では、ハードウェア全体にわたる計算ユニットの最適化が必要であり、メモリ管理、並列処理、およびハードウェア固有の最適化を広範な経験的チューニングを通じてバランスさせることが求められます。Tritonのようなドメイン固有言語は、低レベルの詳細を抽象化することでGPUプログラミングを簡素化しますが、開発者は依然としてタイルサイズやメモリアクセスパターンといった重要なパラメータを反復的な実験を通じて手動でチューニングする必要があり、最適な性能と広範な採用への大きな障壁となっています。本研究では、強化学習（RL）を活用したTritonプログラミング専用の最初のモデルであるAutoTritonを紹介します。AutoTritonは、高品質なデータ収集パイプラインを使用して必須のTritonプログラミング専門知識を備えるために教師ありファインチューニング（SFT）を実施し、Group Relative Policy Optimization（GRPO）アルゴリズムを用いたRLを実行して、ルールベースの報酬と実行ベースの報酬を組み合わせることでTritonプログラミング能力をさらに向上させます。TritonBenchとKernelBenchの5つの評価チャネルにわたる実験では、8BモデルのAutoTritonがClaude-4-SonnetやDeepSeek-R1-0528を含む主流の大規模モデルに匹敵する性能を達成することが示されています。さらに、実験分析により、AutoTriton内の各モジュール（SFT段階、RL段階、報酬設計戦略）の重要な役割が実証されています。これらの発見は、高性能カーネルを自動生成するためのRLの可能性を強調しており、高性能カーネルはAIシステムのコアコンポーネントであるため、このブレークスルーはより効率的なAIシステムを構築するための重要な基盤を確立します。モデルとコードはhttps://github.com/AI9Stars/AutoTritonで公開されます。

English

Kernel development in deep learning requires optimizing computational units across hardware while balancing memory management, parallelism, and hardware-specific optimizations through extensive empirical tuning. Although domain-specific languages like Triton simplify GPU programming by abstracting low-level details, developers must still manually tune critical parameters such as tile sizes and memory access patterns through iterative experimentation, creating substantial barriers to optimal performance and wider adoption. In this work, we introduce AutoTriton, the first model dedicated to Triton programming powered by reinforcement learning (RL). AutoTriton performs supervised fine-tuning (SFT) to be equipped with essential Triton programming expertise using a high-quality data gathering pipeline, and conducts RL with Group Relative Policy Optimization (GRPO) algorithm, combining a rule-based reward and an execution-based reward to further improve Triton programming ability, sequentially. Experiments across five evaluation channels of TritonBench and KernelBench illustrate that our 8B model AutoTriton achieves performance comparable to mainstream large models, including Claude-4-Sonnet and DeepSeek-R1-0528. Further experimental analysis demonstrates the crucial role of each module within AutoTriton, including the SFT stage, the RL stage, and the reward design strategy. These findings underscore the promise of RL for automatically generating high-performance kernels, and since high-performance kernels are core components of AI systems, this breakthrough establishes an important foundation for building more efficient AI systems. The model and code will be available at https://github.com/AI9Stars/AutoTriton.

AutoTriton: 大規模言語モデルにおける強化学習を用いた自動Tritonプログラミング

AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs

要旨

Support