PowerInfer-2:在智慧型手機上快速進行大型語言模型推論
PowerInfer-2: Fast Large Language Model Inference on a Smartphone
June 10, 2024
作者: Zhenliang Xue, Yixin Song, Zeyu Mi, Le Chen, Yubin Xia, Haibo Chen
cs.AI
摘要
本文介紹了PowerInfer-2,一個旨在在智慧型手機上快速推斷大型語言模型(LLM)的框架,特別適用於模型大小超出設備記憶體容量的情況。PowerInfer-2的關鍵見解在於通過將傳統矩陣計算分解為細粒度神經元集群計算,利用智慧型手機中的異構計算、記憶體和I/O資源。具體來說,PowerInfer-2具備多態神經元引擎,可為LLM推斷的各個階段適應計算策略。此外,它引入了分段神經元緩存和細粒度神經元集群級流水線,有效地減少並隱藏了I/O操作帶來的開銷。PowerInfer-2的實施和評估展示了它支持兩款智慧型手機上各種LLM模型的能力,相較於最先進的框架,實現了高達29.2倍的速度提升。值得注意的是,PowerInfer-2是第一個在智慧型手機上以每秒11.68個標記的速率提供TurboSparse-Mixtral-47B模型的系統。對於完全適應記憶體的模型,PowerInfer-2在保持推斷速度與llama.cpp和MLC-LLM相當的情況下,可以實現記憶體使用量的約40%減少。有關更多詳細信息,包括演示視頻,請訪問項目網站www.powerinfer.ai/v2。
English
This paper introduces PowerInfer-2, a framework designed for high-speed
inference of Large Language Models (LLMs) on smartphones, particularly
effective for models whose sizes exceed the device's memory capacity. The key
insight of PowerInfer-2 is to utilize the heterogeneous computation, memory,
and I/O resources in smartphones by decomposing traditional matrix computations
into fine-grained neuron cluster computations. Specifically, PowerInfer-2
features a polymorphic neuron engine that adapts computational strategies for
various stages of LLM inference. Additionally, it introduces segmented neuron
caching and fine-grained neuron-cluster-level pipelining, which effectively
minimize and conceal the overhead caused by I/O operations. The implementation
and evaluation of PowerInfer-2 demonstrate its capability to support a wide
array of LLM models on two smartphones, achieving up to a 29.2x speed increase
compared with state-of-the-art frameworks. Notably, PowerInfer-2 is the first
system to serve the TurboSparse-Mixtral-47B model with a generation rate of
11.68 tokens per second on a smartphone. For models that fit entirely within
the memory, PowerInfer-2 can achieve approximately a 40% reduction in memory
usage while maintaining inference speeds comparable to llama.cpp and MLC-LLM.
For more details, including a demonstration video, please visit the project
site at www.powerinfer.ai/v2.Summary
AI-Generated Summary