Relax：端到端动态机器学习的可组合抽象

摘要

在现代机器学习工作负载中，动态形状计算变得至关重要，尤其是在新兴的大型语言模型中。这些模型的成功推动了将它们部署到各种不同的后端环境的需求。本文介绍了一种名为Relax的编译器抽象，用于优化端到端的动态机器学习工作负载。Relax引入了一流的符号形状注释，用于全局跟踪程序中的动态形状计算。它还引入了一个跨级别的抽象，将计算图、循环级张量程序和库调用封装在一个表示中，以实现跨级别的优化。我们使用提出的方法构建了一个端到端的编译框架，用于优化动态形状模型。对大型语言模型的实验结果显示，Relax在各种平台上提供了与最先进的手动优化系统相竞争的性能，并能够将新兴的动态模型部署到更广泛的环境中，包括手机、嵌入式设备和Web浏览器。

English

Dynamic shape computations have become critical in modern machine learning workloads, especially in emerging large language models. The success of these models has driven demand for deploying them to a diverse set of backend environments. In this paper, we present Relax, a compiler abstraction for optimizing end-to-end dynamic machine learning workloads. Relax introduces first-class symbolic shape annotations to track dynamic shape computations globally across the program. It also introduces a cross-level abstraction that encapsulates computational graphs, loop-level tensor programs, and library calls in a single representation to enable cross-level optimizations. We build an end-to-end compilation framework using the proposed approach to optimize dynamic shape models. Experimental results on large language models show that Relax delivers performance competitive with state-of-the-art hand-optimized systems across platforms and enables deployment of emerging dynamic models to a broader set of environments, including mobile phones, embedded devices, and web browsers.

Relax：端到端动态机器学习的可组合抽象

Relax: Composable Abstractions for End-to-End Dynamic Machine Learning

摘要

Support