Relax:端到端动态机器学习的可组合抽象
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
November 1, 2023
作者: Ruihang Lai, Junru Shao, Siyuan Feng, Steven S. Lyubomirsky, Bohan Hou, Wuwei Lin, Zihao Ye, Hongyi Jin, Yuchen Jin, Jiawei Liu, Lesheng Jin, Yaxing Cai, Ziheng Jiang, Yong Wu, Sunghyun Park, Prakalp Srivastava, Jared G. Roesch, Todd C. Mowry, Tianqi Chen
cs.AI
摘要
在现代机器学习工作负载中,动态形状计算变得至关重要,尤其是在新兴的大型语言模型中。这些模型的成功推动了将它们部署到各种不同的后端环境的需求。本文介绍了一种名为Relax的编译器抽象,用于优化端到端的动态机器学习工作负载。Relax引入了一流的符号形状注释,用于全局跟踪程序中的动态形状计算。它还引入了一个跨级别的抽象,将计算图、循环级张量程序和库调用封装在一个表示中,以实现跨级别的优化。我们使用提出的方法构建了一个端到端的编译框架,用于优化动态形状模型。对大型语言模型的实验结果显示,Relax在各种平台上提供了与最先进的手动优化系统相竞争的性能,并能够将新兴的动态模型部署到更广泛的环境中,包括手机、嵌入式设备和Web浏览器。
English
Dynamic shape computations have become critical in modern machine learning
workloads, especially in emerging large language models. The success of these
models has driven demand for deploying them to a diverse set of backend
environments. In this paper, we present Relax, a compiler abstraction for
optimizing end-to-end dynamic machine learning workloads. Relax introduces
first-class symbolic shape annotations to track dynamic shape computations
globally across the program. It also introduces a cross-level abstraction that
encapsulates computational graphs, loop-level tensor programs, and library
calls in a single representation to enable cross-level optimizations. We build
an end-to-end compilation framework using the proposed approach to optimize
dynamic shape models. Experimental results on large language models show that
Relax delivers performance competitive with state-of-the-art hand-optimized
systems across platforms and enables deployment of emerging dynamic models to a
broader set of environments, including mobile phones, embedded devices, and web
browsers.