AutoTriton：基於強化學習的大型語言模型自動Triton編程

摘要

深度學習中的核心開發需要在硬體上優化計算單元，同時通過大量的經驗調試來平衡記憶體管理、並行處理以及硬體特定的優化。儘管像Triton這樣的領域特定語言通過抽象低層細節簡化了GPU編程，開發者仍需通過迭代實驗手動調整關鍵參數，如分塊大小和記憶體訪問模式，這對實現最佳性能和廣泛應用構成了顯著障礙。在本研究中，我們介紹了AutoTriton，這是首個基於強化學習（RL）專為Triton編程設計的模型。AutoTriton通過高質量數據收集管道進行監督微調（SFT），以掌握必要的Triton編程技能，並採用群組相對策略優化（GRPO）算法進行RL，結合基於規則的獎勵和基於執行的獎勵，逐步提升Triton編程能力。在TritonBench和KernelBench的五個評估通道上的實驗表明，我們的8B模型AutoTriton達到了與主流大模型（包括Claude-4-Sonnet和DeepSeek-R1-0528）相當的性能。進一步的實驗分析展示了AutoTriton中每個模組的關鍵作用，包括SFT階段、RL階段以及獎勵設計策略。這些發現凸顯了RL在自動生成高效能核心方面的潛力，而高效能核心是AI系統的核心組件，這一突破為構建更高效的AI系統奠定了重要基礎。模型和代碼將在https://github.com/AI9Stars/AutoTriton上提供。

English

Kernel development in deep learning requires optimizing computational units across hardware while balancing memory management, parallelism, and hardware-specific optimizations through extensive empirical tuning. Although domain-specific languages like Triton simplify GPU programming by abstracting low-level details, developers must still manually tune critical parameters such as tile sizes and memory access patterns through iterative experimentation, creating substantial barriers to optimal performance and wider adoption. In this work, we introduce AutoTriton, the first model dedicated to Triton programming powered by reinforcement learning (RL). AutoTriton performs supervised fine-tuning (SFT) to be equipped with essential Triton programming expertise using a high-quality data gathering pipeline, and conducts RL with Group Relative Policy Optimization (GRPO) algorithm, combining a rule-based reward and an execution-based reward to further improve Triton programming ability, sequentially. Experiments across five evaluation channels of TritonBench and KernelBench illustrate that our 8B model AutoTriton achieves performance comparable to mainstream large models, including Claude-4-Sonnet and DeepSeek-R1-0528. Further experimental analysis demonstrates the crucial role of each module within AutoTriton, including the SFT stage, the RL stage, and the reward design strategy. These findings underscore the promise of RL for automatically generating high-performance kernels, and since high-performance kernels are core components of AI systems, this breakthrough establishes an important foundation for building more efficient AI systems. The model and code will be available at https://github.com/AI9Stars/AutoTriton.

AutoTriton：基於強化學習的大型語言模型自動Triton編程

AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs

摘要

Support