AlayaDB: 장문맥 LLM 추론을 위한 효율적이고 효과적인 데이터 기반

초록

AlayaDB는 AlayaDB AI에서 대규모 언어 모델(LLM)을 위한 효율적이고 효과적인 장문맥 추론을 위해 네이티브하게 설계된 최첨단 벡터 데이터베이스 시스템입니다. 특히, AlayaDB는 LLM 추론 시스템에서 키-값(KV) 캐시와 어텐션 계산을 분리하여 이를 새로운 벡터 데이터베이스 시스템으로 캡슐화합니다. 모델 서비스 제공자(MaaS)의 경우, AlayaDB는 기존의 대안 솔루션(예: KV 캐시 분리, 검색 기반 희소 어텐션)과 비교하여 더 적은 하드웨어 리소스를 소모하면서 다양한 서비스 수준 목표(SLO)를 가진 다양한 워크로드에 대해 더 높은 생성 품질을 제공합니다. AlayaDB의 핵심은 LLM 추론을 위한 어텐션 계산과 캐시 관리를 쿼리 처리 프로세스로 추상화하고, 네이티브 쿼리 최적화기를 통해 성능을 최적화한다는 점입니다. 본 연구에서는 (i) 산업 파트너로부터의 세 가지 사용 사례와 (ii) LLM 추론 벤치마크에 대한 광범위한 실험 결과를 통해 AlayaDB의 효과를 입증합니다.

English

AlayaDB is a cutting-edge vector database system natively architected for efficient and effective long-context inference for Large Language Models (LLMs) at AlayaDB AI. Specifically, it decouples the KV cache and attention computation from the LLM inference systems, and encapsulates them into a novel vector database system. For the Model as a Service providers (MaaS), AlayaDB consumes fewer hardware resources and offers higher generation quality for various workloads with different kinds of Service Level Objectives (SLOs), when comparing with the existing alternative solutions (e.g., KV cache disaggregation, retrieval-based sparse attention). The crux of AlayaDB is that it abstracts the attention computation and cache management for LLM inference into a query processing procedure, and optimizes the performance via a native query optimizer. In this work, we demonstrate the effectiveness of AlayaDB via (i) three use cases from our industry partners, and (ii) extensive experimental results on LLM inference benchmarks.

AlayaDB: 장문맥 LLM 추론을 위한 효율적이고 효과적인 데이터 기반

AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference

초록

Support