ConsumerBench：终端设备上生成式AI应用的基准测试

摘要

近期，生成式人工智能（GenAI）应用从纯云端环境向终端用户设备的转移，在资源管理、系统效率和用户体验方面引入了新的挑战。本文提出了ConsumerBench，一个全面的基准测试框架，旨在评估在终端用户设备上运行的GenAI模型的系统效率和响应时间。与现有假设模型独占专用GPU的基准测试不同，ConsumerBench模拟了在受限硬件上并发执行的多应用真实场景。此外，ConsumerBench支持定制化工作流，这些工作流模拟了需要多个应用协同完成的复杂任务。ConsumerBench既捕捉了应用层面的指标，如延迟和服务水平目标（SLO）达成率，也记录了系统层面的指标，如CPU/GPU利用率和内存带宽。通过大量实验，ConsumerBench揭示了资源共享中的低效、贪婪分配下的不公平调度以及静态模型服务器配置的性能陷阱。本文还为模型开发者和系统设计者提供了实用洞见，强调了针对消费级GPU架构定制内核的优势，以及实施SLO感知调度策略的价值。

English

The recent shift in Generative AI (GenAI) applications from cloud-only environments to end-user devices introduces new challenges in resource management, system efficiency, and user experience. This paper presents ConsumerBench, a comprehensive benchmarking framework designed to evaluate the system efficiency and response time of GenAI models running on end-user devices. Unlike existing benchmarks that assume exclusive model access on dedicated GPUs, ConsumerBench simulates realistic multi-application scenarios executing concurrently on constrained hardware. Furthermore, ConsumerBench supports customizable workflows that simulate complex tasks requiring coordination among multiple applications. ConsumerBench captures both application-level metrics, including latency and Service Level Objective (SLO) attainment, and system-level metrics like CPU/GPU utilization and memory bandwidth. Through extensive experiments, ConsumerBench reveals inefficiencies in resource sharing, unfair scheduling under greedy allocation, and performance pitfalls of static model server configurations. The paper also provides practical insights for model developers and system designers, highlighting the benefits of custom kernels tailored to consumer-grade GPU architectures and the value of implementing SLO-aware scheduling strategies.

ConsumerBench：终端设备上生成式AI应用的基准测试

ConsumerBench: Benchmarking Generative AI Applications on End-User Devices

摘要

Support