コンパイラ最適化のための大規模言語モデル

要旨

大規模言語モデルのコード最適化への新たな応用を探求する。本論文では、コードサイズ最適化のためにLLVMアセンブリを最適化する7BパラメータのTransformerモデルをゼロから学習させた。このモデルは、最適化されていないアセンブリを入力として受け取り、プログラムを最適化するための最適なコンパイラオプションのリストを出力する。重要な点として、学習中にモデルに対して最適化前後の命令数を予測させ、さらに最適化されたコード自体も予測させる。これらの補助学習タスクにより、モデルの最適化性能が大幅に向上し、モデルの理解の深さが増す。大規模なテストプログラム群を用いて評価を行った。本手法は、コンパイラを上回る3.0%の命令数削減を達成し、数千回のコンパイルを必要とする2つの最先端ベースラインを凌駕した。さらに、このモデルは驚くほど強力なコード推論能力を示し、91%の確率でコンパイル可能なコードを生成し、70%の確率でコンパイラの出力を完璧に模倣した。

English

We explore the novel application of Large Language Models to code optimization. We present a 7B-parameter transformer model trained from scratch to optimize LLVM assembly for code size. The model takes as input unoptimized assembly and outputs a list of compiler options to best optimize the program. Crucially, during training, we ask the model to predict the instruction counts before and after optimization, and the optimized code itself. These auxiliary learning tasks significantly improve the optimization performance of the model and improve the model's depth of understanding. We evaluate on a large suite of test programs. Our approach achieves a 3.0% improvement in reducing instruction counts over the compiler, outperforming two state-of-the-art baselines that require thousands of compilations. Furthermore, the model shows surprisingly strong code reasoning abilities, generating compilable code 91% of the time and perfectly emulating the output of the compiler 70% of the time.

コンパイラ最適化のための大規模言語モデル

Large Language Models for Compiler Optimization

要旨

Support