Relax: エンドツーエンド動的機械学習のための構成可能な抽象化

要旨

動的な形状計算は、現代の機械学習ワークロード、特に新興の大規模言語モデルにおいて重要な役割を果たすようになりました。これらのモデルの成功は、多様なバックエンド環境への展開需要を高めています。本論文では、エンドツーエンドの動的機械学習ワークロードを最適化するためのコンパイラ抽象化であるRelaxを紹介します。Relaxは、プログラム全体にわたって動的な形状計算を追跡するための第一級のシンボリック形状アノテーションを導入します。また、計算グラフ、ループレベルのテンソルプログラム、ライブラリ呼び出しを単一の表現にカプセル化するクロスレベル抽象化を導入し、クロスレベル最適化を可能にします。提案手法を用いて、動的形状モデルを最適化するエンドツーエンドのコンパイルフレームワークを構築しました。大規模言語モデルにおける実験結果は、Relaxがプラットフォーム間で最先端の手動最適化システムと競合する性能を発揮し、モバイル電話、組み込みデバイス、ウェブブラウザを含むより広範な環境への新興動的モデルの展開を可能にすることを示しています。

English

Dynamic shape computations have become critical in modern machine learning workloads, especially in emerging large language models. The success of these models has driven demand for deploying them to a diverse set of backend environments. In this paper, we present Relax, a compiler abstraction for optimizing end-to-end dynamic machine learning workloads. Relax introduces first-class symbolic shape annotations to track dynamic shape computations globally across the program. It also introduces a cross-level abstraction that encapsulates computational graphs, loop-level tensor programs, and library calls in a single representation to enable cross-level optimizations. We build an end-to-end compilation framework using the proposed approach to optimize dynamic shape models. Experimental results on large language models show that Relax delivers performance competitive with state-of-the-art hand-optimized systems across platforms and enables deployment of emerging dynamic models to a broader set of environments, including mobile phones, embedded devices, and web browsers.

Relax: エンドツーエンド動的機械学習のための構成可能な抽象化

Relax: Composable Abstractions for End-to-End Dynamic Machine Learning

要旨

Support