ChatPaper.aiChatPaper

透過強化學習利用大型語言模型提升組合語言程式碼效能

Improving Assembly Code Performance with Large Language Models via Reinforcement Learning

May 16, 2025
作者: Anjiang Wei, Tarun Suresh, Huanmi Tan, Yinglun Xu, Gagandeep Singh, Ke Wang, Alex Aiken
cs.AI

摘要

大型語言模型(LLMs)在多種程式設計任務中展現了卓越的性能,但其在程式碼優化方面的潛力仍未被充分探索。本研究探討了LLMs是否能夠優化組合語言程式碼的性能,其中對執行的細粒度控制使得改進難以在高階語言中表達。我們提出了一個強化學習框架,使用近端策略優化(PPO)來訓練LLMs,並以一個獎勵函數為指導,該函數考慮了通過測試案例驗證的功能正確性,以及相對於業界標準編譯器gcc -O3的執行性能。為了支持這項研究,我們引入了一個包含8,072個真實世界程式的基準測試集。我們的模型Qwen2.5-Coder-7B-PPO達到了96.0%的測試通過率,並且相較於gcc -O3基準線平均加速了1.47倍,超越了包括Claude-3.7-sonnet在內的所有20個評估模型。這些結果表明,強化學習能夠釋放LLMs作為組合語言程式碼性能有效優化器的潛力。
English
Large language models (LLMs) have demonstrated strong performance across a wide range of programming tasks, yet their potential for code optimization remains underexplored. This work investigates whether LLMs can optimize the performance of assembly code, where fine-grained control over execution enables improvements that are difficult to express in high-level languages. We present a reinforcement learning framework that trains LLMs using Proximal Policy Optimization (PPO), guided by a reward function that considers both functional correctness, validated through test cases, and execution performance relative to the industry-standard compiler gcc -O3. To support this study, we introduce a benchmark of 8,072 real-world programs. Our model, Qwen2.5-Coder-7B-PPO, achieves 96.0% test pass rates and an average speedup of 1.47x over the gcc -O3 baseline, outperforming all 20 other models evaluated, including Claude-3.7-sonnet. These results indicate that reinforcement learning can unlock the potential of LLMs to serve as effective optimizers for assembly code performance.

Summary

AI-Generated Summary

PDF52May 19, 2025