컴파일러 최적화를 위한 대형 언어 모델

초록

우리는 대규모 언어 모델(Large Language Models)을 코드 최적화에 적용하는 새로운 접근 방식을 탐구합니다. 본 연구에서는 코드 크기를 최적화하기 위해 LLVM 어셈블리를 대상으로 처음부터 학습된 70억 파라미터 규모의 트랜스포머 모델을 제시합니다. 이 모델은 최적화되지 않은 어셈블리를 입력으로 받아 프로그램을 최적으로 최적화하기 위한 컴파일러 옵션 목록을 출력합니다. 특히, 학습 과정에서 모델은 최적화 전후의 명령어 수와 최적화된 코드 자체를 예측하도록 요구받습니다. 이러한 보조 학습 작업은 모델의 최적화 성능을 크게 향상시키고, 모델의 이해 깊이를 높이는 데 기여합니다. 우리는 다양한 테스트 프로그램을 대상으로 모델을 평가했습니다. 우리의 접근 방식은 컴파일러 대비 명령어 수를 3.0% 더 줄이는 성과를 달성했으며, 수천 번의 컴파일이 필요한 두 개의 최신 베이스라인을 능가했습니다. 더욱이, 이 모델은 놀라울 정도로 강력한 코드 추론 능력을 보여주며, 91%의 경우 컴파일 가능한 코드를 생성하고 70%의 경우 컴파일러의 출력을 완벽하게 모방했습니다.

English

We explore the novel application of Large Language Models to code optimization. We present a 7B-parameter transformer model trained from scratch to optimize LLVM assembly for code size. The model takes as input unoptimized assembly and outputs a list of compiler options to best optimize the program. Crucially, during training, we ask the model to predict the instruction counts before and after optimization, and the optimized code itself. These auxiliary learning tasks significantly improve the optimization performance of the model and improve the model's depth of understanding. We evaluate on a large suite of test programs. Our approach achieves a 3.0% improvement in reducing instruction counts over the compiler, outperforming two state-of-the-art baselines that require thousands of compilations. Furthermore, the model shows surprisingly strong code reasoning abilities, generating compilable code 91% of the time and perfectly emulating the output of the compiler 70% of the time.

컴파일러 최적화를 위한 대형 언어 모델

Large Language Models for Compiler Optimization

초록

Support