AutoTriton：基于大语言模型与强化学习的自动Triton编程

摘要

深度学习中的内核开发需要在硬件层面优化计算单元，同时通过大量实验调优来平衡内存管理、并行性以及硬件特定的优化。尽管像Triton这样的领域特定语言通过抽象底层细节简化了GPU编程，但开发者仍需通过迭代实验手动调整关键参数，如分块大小和内存访问模式，这为达到最优性能和广泛采用设置了显著障碍。在本研究中，我们推出了AutoTriton，这是首个基于强化学习（RL）专为Triton编程设计的模型。AutoTriton通过高质量数据收集管道进行监督微调（SFT），掌握Triton编程的核心技能，并采用分组相对策略优化（GRPO）算法进行RL训练，结合基于规则的奖励和基于执行的奖励，逐步提升Triton编程能力。在TritonBench和KernelBench的五个评估通道上的实验表明，我们的8B模型AutoTriton在性能上可与Claude-4-Sonnet和DeepSeek-R1-0528等主流大模型相媲美。进一步的实验分析揭示了AutoTriton内部各模块的关键作用，包括SFT阶段、RL阶段及奖励设计策略。这些发现凸显了RL在自动生成高性能内核方面的潜力，而高性能内核是AI系统的核心组件，这一突破为构建更高效的AI系统奠定了重要基础。模型与代码将发布于https://github.com/AI9Stars/AutoTriton。

English

Kernel development in deep learning requires optimizing computational units across hardware while balancing memory management, parallelism, and hardware-specific optimizations through extensive empirical tuning. Although domain-specific languages like Triton simplify GPU programming by abstracting low-level details, developers must still manually tune critical parameters such as tile sizes and memory access patterns through iterative experimentation, creating substantial barriers to optimal performance and wider adoption. In this work, we introduce AutoTriton, the first model dedicated to Triton programming powered by reinforcement learning (RL). AutoTriton performs supervised fine-tuning (SFT) to be equipped with essential Triton programming expertise using a high-quality data gathering pipeline, and conducts RL with Group Relative Policy Optimization (GRPO) algorithm, combining a rule-based reward and an execution-based reward to further improve Triton programming ability, sequentially. Experiments across five evaluation channels of TritonBench and KernelBench illustrate that our 8B model AutoTriton achieves performance comparable to mainstream large models, including Claude-4-Sonnet and DeepSeek-R1-0528. Further experimental analysis demonstrates the crucial role of each module within AutoTriton, including the SFT stage, the RL stage, and the reward design strategy. These findings underscore the promise of RL for automatically generating high-performance kernels, and since high-performance kernels are core components of AI systems, this breakthrough establishes an important foundation for building more efficient AI systems. The model and code will be available at https://github.com/AI9Stars/AutoTriton.

AutoTriton：基于大语言模型与强化学习的自动Triton编程

AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs

摘要

Support