ConsumerBench：終端設備上生成式AI應用的基準測試

摘要

近期生成式人工智慧（GenAI）應用從僅限雲端環境轉向終端用戶設備，這在資源管理、系統效率和用戶體驗方面引入了新的挑戰。本文提出了ConsumerBench，這是一個全面的基準測試框架，旨在評估在終端用戶設備上運行的GenAI模型的系統效率和響應時間。與現有假設模型在專用GPU上獨占存取的基準測試不同，ConsumerBench模擬了在受限硬體上並行執行的真實多應用場景。此外，ConsumerBench支援可自訂的工作流程，這些流程模擬了需要多個應用協調的複雜任務。ConsumerBench捕捉了包括延遲和服務水平目標（SLO）達成率在內的應用層指標，以及CPU/GPU利用率和記憶體頻寬等系統層指標。通過大量實驗，ConsumerBench揭示了資源共享中的低效性、貪婪分配下的不公平排程，以及靜態模型伺服器配置的性能缺陷。本文還為模型開發者和系統設計者提供了實用見解，強調了針對消費級GPU架構量身定制的核心程式碼的優勢，以及實施SLO感知排程策略的價值。

English

The recent shift in Generative AI (GenAI) applications from cloud-only environments to end-user devices introduces new challenges in resource management, system efficiency, and user experience. This paper presents ConsumerBench, a comprehensive benchmarking framework designed to evaluate the system efficiency and response time of GenAI models running on end-user devices. Unlike existing benchmarks that assume exclusive model access on dedicated GPUs, ConsumerBench simulates realistic multi-application scenarios executing concurrently on constrained hardware. Furthermore, ConsumerBench supports customizable workflows that simulate complex tasks requiring coordination among multiple applications. ConsumerBench captures both application-level metrics, including latency and Service Level Objective (SLO) attainment, and system-level metrics like CPU/GPU utilization and memory bandwidth. Through extensive experiments, ConsumerBench reveals inefficiencies in resource sharing, unfair scheduling under greedy allocation, and performance pitfalls of static model server configurations. The paper also provides practical insights for model developers and system designers, highlighting the benefits of custom kernels tailored to consumer-grade GPU architectures and the value of implementing SLO-aware scheduling strategies.

ConsumerBench：終端設備上生成式AI應用的基準測試

ConsumerBench: Benchmarking Generative AI Applications on End-User Devices

摘要

Support