通过强化学习利用大型语言模型提升汇编代码性能
Improving Assembly Code Performance with Large Language Models via Reinforcement Learning
May 16, 2025
作者: Anjiang Wei, Tarun Suresh, Huanmi Tan, Yinglun Xu, Gagandeep Singh, Ke Wang, Alex Aiken
cs.AI
摘要
大型语言模型(LLMs)在广泛的编程任务中展现了卓越的性能,但其在代码优化方面的潜力尚未得到充分探索。本研究探讨了LLMs是否能够优化汇编代码的性能,其中对执行的精细控制使得改进难以在高级语言中表达。我们提出了一个强化学习框架,该框架使用近端策略优化(PPO)训练LLMs,并通过一个奖励函数进行指导,该函数考虑了通过测试用例验证的功能正确性,以及与行业标准编译器gcc -O3相比的执行性能。为了支持这项研究,我们引入了一个包含8,072个真实世界程序的基准。我们的模型Qwen2.5-Coder-7B-PPO实现了96.0%的测试通过率,并且相对于gcc -O3基线平均加速了1.47倍,优于包括Claude-3.7-sonnet在内的其他20个评估模型。这些结果表明,强化学习能够释放LLMs作为汇编代码性能有效优化器的潜力。
English
Large language models (LLMs) have demonstrated strong performance across a
wide range of programming tasks, yet their potential for code optimization
remains underexplored. This work investigates whether LLMs can optimize the
performance of assembly code, where fine-grained control over execution enables
improvements that are difficult to express in high-level languages. We present
a reinforcement learning framework that trains LLMs using Proximal Policy
Optimization (PPO), guided by a reward function that considers both functional
correctness, validated through test cases, and execution performance relative
to the industry-standard compiler gcc -O3. To support this study, we introduce
a benchmark of 8,072 real-world programs. Our model, Qwen2.5-Coder-7B-PPO,
achieves 96.0% test pass rates and an average speedup of 1.47x over the gcc -O3
baseline, outperforming all 20 other models evaluated, including
Claude-3.7-sonnet. These results indicate that reinforcement learning can
unlock the potential of LLMs to serve as effective optimizers for assembly code
performance.Summary
AI-Generated Summary