通过强化学习利用大型语言模型提升汇编代码性能

摘要

大型语言模型（LLMs）在广泛的编程任务中展现了卓越的性能，但其在代码优化方面的潜力尚未得到充分探索。本研究探讨了LLMs是否能够优化汇编代码的性能，其中对执行的精细控制使得改进难以在高级语言中表达。我们提出了一个强化学习框架，该框架使用近端策略优化（PPO）训练LLMs，并通过一个奖励函数进行指导，该函数考虑了通过测试用例验证的功能正确性，以及与行业标准编译器gcc -O3相比的执行性能。为了支持这项研究，我们引入了一个包含8,072个真实世界程序的基准。我们的模型Qwen2.5-Coder-7B-PPO实现了96.0%的测试通过率，并且相对于gcc -O3基线平均加速了1.47倍，优于包括Claude-3.7-sonnet在内的其他20个评估模型。这些结果表明，强化学习能够释放LLMs作为汇编代码性能有效优化器的潜力。

English

Large language models (LLMs) have demonstrated strong performance across a wide range of programming tasks, yet their potential for code optimization remains underexplored. This work investigates whether LLMs can optimize the performance of assembly code, where fine-grained control over execution enables improvements that are difficult to express in high-level languages. We present a reinforcement learning framework that trains LLMs using Proximal Policy Optimization (PPO), guided by a reward function that considers both functional correctness, validated through test cases, and execution performance relative to the industry-standard compiler gcc -O3. To support this study, we introduce a benchmark of 8,072 real-world programs. Our model, Qwen2.5-Coder-7B-PPO, achieves 96.0% test pass rates and an average speedup of 1.47x over the gcc -O3 baseline, outperforming all 20 other models evaluated, including Claude-3.7-sonnet. These results indicate that reinforcement learning can unlock the potential of LLMs to serve as effective optimizers for assembly code performance.

通过强化学习利用大型语言模型提升汇编代码性能

Improving Assembly Code Performance with Large Language Models via Reinforcement Learning

摘要

Support