ConsumerBench: エンドユーザーデバイスにおける生成AIアプリケーションのベンチマーキング

要旨

生成AI（GenAI）アプリケーションの最近のトレンドとして、クラウド専用環境からエンドユーザーデバイスへの移行が進んでおり、リソース管理、システム効率、ユーザーエクスペリエンスにおいて新たな課題が生じています。本論文では、エンドユーザーデバイス上で動作するGenAIモデルのシステム効率と応答時間を評価するための包括的なベンチマークフレームワーク「ConsumerBench」を提案します。既存のベンチマークが専用GPU上での排他的なモデルアクセスを前提としているのに対し、ConsumerBenchは制約のあるハードウェア上で並行して実行される現実的なマルチアプリケーションシナリオをシミュレートします。さらに、ConsumerBenchは複数のアプリケーション間の連携を必要とする複雑なタスクをシミュレートするカスタマイズ可能なワークフローをサポートします。ConsumerBenchは、レイテンシやサービスレベル目標（SLO）達成率などのアプリケーションレベルのメトリクスと、CPU/GPU使用率やメモリ帯域幅などのシステムレベルのメトリクスの両方を捕捉します。広範な実験を通じて、ConsumerBenchはリソース共有の非効率性、貪欲な割り当て下での不公平なスケジューリング、静的モデルサーバー設定のパフォーマンス上の欠点を明らかにします。また、本論文では、コンシューマー向けGPUアーキテクチャに特化したカスタムカーネルの利点や、SLOを意識したスケジューリング戦略を実装することの価値について、モデル開発者やシステム設計者向けの実践的な洞察を提供します。

English

The recent shift in Generative AI (GenAI) applications from cloud-only environments to end-user devices introduces new challenges in resource management, system efficiency, and user experience. This paper presents ConsumerBench, a comprehensive benchmarking framework designed to evaluate the system efficiency and response time of GenAI models running on end-user devices. Unlike existing benchmarks that assume exclusive model access on dedicated GPUs, ConsumerBench simulates realistic multi-application scenarios executing concurrently on constrained hardware. Furthermore, ConsumerBench supports customizable workflows that simulate complex tasks requiring coordination among multiple applications. ConsumerBench captures both application-level metrics, including latency and Service Level Objective (SLO) attainment, and system-level metrics like CPU/GPU utilization and memory bandwidth. Through extensive experiments, ConsumerBench reveals inefficiencies in resource sharing, unfair scheduling under greedy allocation, and performance pitfalls of static model server configurations. The paper also provides practical insights for model developers and system designers, highlighting the benefits of custom kernels tailored to consumer-grade GPU architectures and the value of implementing SLO-aware scheduling strategies.

ConsumerBench: エンドユーザーデバイスにおける生成AIアプリケーションのベンチマーキング

ConsumerBench: Benchmarking Generative AI Applications on End-User Devices

要旨

Support