深層強化学習を用いたメモリマッピングの最適化

要旨

リソーススケジューリングと割り当ては、輻輳制御からクラウドコンピューティングに至るまで、多くの高影響システムにおける重要な要素です。これらの問題に対してより最適な解を見つけることは、リソースと時間の節約、デバイスの摩耗の低減、さらには二酸化炭素排出量の削減にさえも大きな影響を与える可能性があります。本論文では、特に機械学習プログラムのコンパイル時に発生するメモリマッピング問題、すなわちテンソルを異なるメモリ層にマッピングして実行時間を最適化する問題に焦点を当てます。我々は、強化学習（Reinforcement Learning, RL）を用いてメモリマッピング問題を解決するアプローチを提案します。RLは、計画可能な逐次意思決定問題や、高次元データ入力を持つ組み合わせ探索空間に適した解決パラダイムです。本問題をシングルプレイヤーゲームとして定式化し、これをmallocGameと呼びます。このゲームにおける高報酬の軌跡は、ターゲットハードウェア上での効率的なメモリマッピングに対応します。また、強化学習エージェントであるmallocMuZeroを導入し、このゲームをプレイすることで、MLアクセラレータ上での実際のMLワークロードにおいて実行時間を短縮する新たなメモリマッピングソリューションを発見できることを示します。我々は、mallocMuZeroの性能を、Accelerated Linear Algebra（XLA）コンパイラが使用するデフォルトソルバーと、現実的なMLワークロードのベンチマークで比較します。さらに、mallocMuZeroが、最近発表されたAlphaTensor行列乗算モデルの実行時間を改善できることも示します。

English

Resource scheduling and allocation is a critical component of many high impact systems ranging from congestion control to cloud computing. Finding more optimal solutions to these problems often has significant impact on resource and time savings, reducing device wear-and-tear, and even potentially improving carbon emissions. In this paper, we focus on a specific instance of a scheduling problem, namely the memory mapping problem that occurs during compilation of machine learning programs: That is, mapping tensors to different memory layers to optimize execution time. We introduce an approach for solving the memory mapping problem using Reinforcement Learning. RL is a solution paradigm well-suited for sequential decision making problems that are amenable to planning, and combinatorial search spaces with high-dimensional data inputs. We formulate the problem as a single-player game, which we call the mallocGame, such that high-reward trajectories of the game correspond to efficient memory mappings on the target hardware. We also introduce a Reinforcement Learning agent, mallocMuZero, and show that it is capable of playing this game to discover new and improved memory mapping solutions that lead to faster execution times on real ML workloads on ML accelerators. We compare the performance of mallocMuZero to the default solver used by the Accelerated Linear Algebra (XLA) compiler on a benchmark of realistic ML workloads. In addition, we show that mallocMuZero is capable of improving the execution time of the recently published AlphaTensor matrix multiplication model.

深層強化学習を用いたメモリマッピングの最適化

Optimizing Memory Mapping Using Deep Reinforcement Learning

要旨

Support