大型语言模型的编译器生成反馈

摘要

我们引入了一种新的编译器优化范式，利用大型语言模型与编译器反馈来优化LLVM汇编代码的大小。该模型以未经优化的LLVM IR作为输入，生成优化后的IR、最佳优化传递方式，以及未经优化和优化后的IR的指令计数。然后，我们使用生成的优化传递方式编译输入，并评估预测的指令计数是否正确，生成的IR是否可编译，并且是否与编译后的代码相对应。我们将这些反馈返回给LLM，并让其有机会再次优化代码。这种方法相比于原始模型的 -Oz 可额外提高0.53%。尽管添加更多反馈信息似乎很直观，但简单的采样技术在给定10个或更多样本时实现了更高的性能。

English

We introduce a novel paradigm in compiler optimization powered by Large Language Models with compiler feedback to optimize the code size of LLVM assembly. The model takes unoptimized LLVM IR as input and produces optimized IR, the best optimization passes, and instruction counts of both unoptimized and optimized IRs. Then we compile the input with generated optimization passes and evaluate if the predicted instruction count is correct, generated IR is compilable, and corresponds to compiled code. We provide this feedback back to LLM and give it another chance to optimize code. This approach adds an extra 0.53% improvement over -Oz to the original model. Even though, adding more information with feedback seems intuitive, simple sampling techniques achieve much higher performance given 10 or more samples.

大型语言模型的编译器生成反馈

Compiler generated feedback for Large Language Models

摘要

Support