Relax: 종단 간 동적 머신러닝을 위한 구성 가능한 추상화

초록

동적 형태 계산은 현대 머신러닝 워크로드, 특히 새롭게 부상하는 대규모 언어 모델에서 중요한 요소로 자리 잡았습니다. 이러한 모델들의 성공은 다양한 백엔드 환경에 이들을 배포하려는 수요를 촉진시켰습니다. 본 논문에서는 동적 머신러닝 워크로드의 종단 간 최적화를 위한 컴파일러 추상화인 Relax를 소개합니다. Relax는 프로그램 전역에 걸쳐 동적 형태 계산을 추적하기 위해 일급 기호 형태 주석을 도입합니다. 또한, 계산 그래프, 루프 수준 텐서 프로그램, 라이브러리 호출을 단일 표현으로 캡슐화하여 교차 수준 최적화를 가능하게 하는 교차 수준 추상화를 제안합니다. 우리는 제안된 접근법을 사용하여 동적 형태 모델을 최적화하는 종단 간 컴파일 프레임워크를 구축했습니다. 대규모 언어 모델에 대한 실험 결과는 Relax가 플랫폼 간 최신 수동 최적화 시스템과 경쟁력 있는 성능을 제공하며, 모바일 폰, 임베디드 장치, 웹 브라우저를 포함한 더 광범위한 환경에 새롭게 부상하는 동적 모델의 배포를 가능하게 함을 보여줍니다.

English

Dynamic shape computations have become critical in modern machine learning workloads, especially in emerging large language models. The success of these models has driven demand for deploying them to a diverse set of backend environments. In this paper, we present Relax, a compiler abstraction for optimizing end-to-end dynamic machine learning workloads. Relax introduces first-class symbolic shape annotations to track dynamic shape computations globally across the program. It also introduces a cross-level abstraction that encapsulates computational graphs, loop-level tensor programs, and library calls in a single representation to enable cross-level optimizations. We build an end-to-end compilation framework using the proposed approach to optimize dynamic shape models. Experimental results on large language models show that Relax delivers performance competitive with state-of-the-art hand-optimized systems across platforms and enables deployment of emerging dynamic models to a broader set of environments, including mobile phones, embedded devices, and web browsers.

Relax: 종단 간 동적 머신러닝을 위한 구성 가능한 추상화

Relax: Composable Abstractions for End-to-End Dynamic Machine Learning

초록

Support