AutoTriton:基於強化學習的大型語言模型自動Triton編程
AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs
July 8, 2025
作者: Shangzhan Li, Zefan Wang, Ye He, Yuxuan Li, Qi Shi, Jianling Li, Yonggang Hu, Wanxiang Che, Xu Han, Zhiyuan Liu, Maosong Sun
cs.AI
摘要
深度學習中的核心開發需要在硬體上優化計算單元,同時通過大量的經驗調試來平衡記憶體管理、並行處理以及硬體特定的優化。儘管像Triton這樣的領域特定語言通過抽象低層細節簡化了GPU編程,開發者仍需通過迭代實驗手動調整關鍵參數,如分塊大小和記憶體訪問模式,這對實現最佳性能和廣泛應用構成了顯著障礙。在本研究中,我們介紹了AutoTriton,這是首個基於強化學習(RL)專為Triton編程設計的模型。AutoTriton通過高質量數據收集管道進行監督微調(SFT),以掌握必要的Triton編程技能,並採用群組相對策略優化(GRPO)算法進行RL,結合基於規則的獎勵和基於執行的獎勵,逐步提升Triton編程能力。在TritonBench和KernelBench的五個評估通道上的實驗表明,我們的8B模型AutoTriton達到了與主流大模型(包括Claude-4-Sonnet和DeepSeek-R1-0528)相當的性能。進一步的實驗分析展示了AutoTriton中每個模組的關鍵作用,包括SFT階段、RL階段以及獎勵設計策略。這些發現凸顯了RL在自動生成高效能核心方面的潛力,而高效能核心是AI系統的核心組件,這一突破為構建更高效的AI系統奠定了重要基礎。模型和代碼將在https://github.com/AI9Stars/AutoTriton上提供。
English
Kernel development in deep learning requires optimizing computational units
across hardware while balancing memory management, parallelism, and
hardware-specific optimizations through extensive empirical tuning. Although
domain-specific languages like Triton simplify GPU programming by abstracting
low-level details, developers must still manually tune critical parameters such
as tile sizes and memory access patterns through iterative experimentation,
creating substantial barriers to optimal performance and wider adoption. In
this work, we introduce AutoTriton, the first model dedicated to Triton
programming powered by reinforcement learning (RL). AutoTriton performs
supervised fine-tuning (SFT) to be equipped with essential Triton programming
expertise using a high-quality data gathering pipeline, and conducts RL with
Group Relative Policy Optimization (GRPO) algorithm, combining a rule-based
reward and an execution-based reward to further improve Triton programming
ability, sequentially. Experiments across five evaluation channels of
TritonBench and KernelBench illustrate that our 8B model AutoTriton achieves
performance comparable to mainstream large models, including Claude-4-Sonnet
and DeepSeek-R1-0528. Further experimental analysis demonstrates the crucial
role of each module within AutoTriton, including the SFT stage, the RL stage,
and the reward design strategy. These findings underscore the promise of RL for
automatically generating high-performance kernels, and since high-performance
kernels are core components of AI systems, this breakthrough establishes an
important foundation for building more efficient AI systems. The model and code
will be available at https://github.com/AI9Stars/AutoTriton.