Parrot: 시맨틱 변수를 활용한 LLM 기반 애플리케이션의 효율적 서빙

초록

대규모 언어 모델(LLM)의 부상은 LLM 기반 애플리케이션(일명 AI 에이전트 또는 코파일럿)이라는 새로운 소프트웨어 패러다임을 가능하게 했습니다. 이는 LLM의 강점과 기존 소프트웨어를 결합한 것입니다. 다양한 테넌트의 LLM 애플리케이션은 하나의 작업을 완료하기 위해 여러 LLM 요청을 사용하여 복잡한 워크플로우를 설계할 수 있습니다. 그러나 현재의 공개 LLM 서비스가 제공하는 지나치게 단순화된 요청 수준의 API를 사용해야 하기 때문에 필수적인 애플리케이션 수준의 정보를 잃게 됩니다. 공개 LLM 서비스는 개별 LLM 요청을 맹목적으로 최적화해야 하므로, LLM 애플리케이션의 종단 간 성능이 최적화되지 못하는 결과를 초래합니다. 이 논문은 LLM 기반 애플리케이션의 종단 간 경험에 초점을 맞춘 LLM 서비스 시스템인 Parrot을 소개합니다. Parrot은 애플리케이션 수준의 지식을 공개 LLM 서비스에 노출시키기 위한 통합 추상화인 Semantic Variable을 제안합니다. Semantic Variable은 요청의 프롬프트에서 입력/출력 변수를 주석 처리하고, 여러 LLM 요청을 연결할 때 데이터 파이프라인을 생성하여 LLM 애플리케이션을 프로그래밍하는 자연스러운 방법을 제공합니다. Semantic Variable을 공개 LLM 서비스에 노출시키면, 기존의 데이터 흐름 분석을 수행하여 여러 LLM 요청 간의 상관관계를 밝힐 수 있습니다. 이 상관관계는 LLM 기반 애플리케이션의 종단 간 성능을 위한 완전히 새로운 최적화 공간을 열어줍니다. 광범위한 평가를 통해 Parrot이 LLM 애플리케이션의 인기 있고 실용적인 사용 사례에서 최대 10배의 성능 향상을 달성할 수 있음을 입증했습니다.

English

The rise of large language models (LLMs) has enabled LLM-based applications (a.k.a. AI agents or co-pilots), a new software paradigm that combines the strength of LLM and conventional software. Diverse LLM applications from different tenants could design complex workflows using multiple LLM requests to accomplish one task. However, they have to use the over-simplified request-level API provided by today's public LLM services, losing essential application-level information. Public LLM services have to blindly optimize individual LLM requests, leading to sub-optimal end-to-end performance of LLM applications. This paper introduces Parrot, an LLM service system that focuses on the end-to-end experience of LLM-based applications. Parrot proposes Semantic Variable, a unified abstraction to expose application-level knowledge to public LLM services. A Semantic Variable annotates an input/output variable in the prompt of a request, and creates the data pipeline when connecting multiple LLM requests, providing a natural way to program LLM applications. Exposing Semantic Variables to the public LLM service allows it to perform conventional data flow analysis to uncover the correlation across multiple LLM requests. This correlation opens a brand-new optimization space for the end-to-end performance of LLM-based applications. Extensive evaluations demonstrate that Parrot can achieve up to an order-of-magnitude improvement for popular and practical use cases of LLM applications.

Parrot: 시맨틱 변수를 활용한 LLM 기반 애플리케이션의 효율적 서빙

Parrot: Efficient Serving of LLM-based Applications with Semantic Variable

초록

Summary

Support

Support