ハードウェアおよびソフトウェアプラットフォーム推論

要旨

大規模言語モデル（LLM）の推論にアクセスするために、自己ホストする代わりに購入することが一般的なビジネス実践となっています。これは、大規模なハードウェアインフラストラクチャとエネルギーコストがかかるためです。しかし、購入者としては、NVIDIA H100を使用していることなど、広告されたサービスの信頼性を検証するメカニズムがありません。さらに、モデルプロバイダーが、広告されたものとわずかに異なるモデルを提供する可能性があるという報告もあります。これは、より安価なハードウェアで動作させるために、しばしば広告されたものとは異なるモデルを提供することがあります。その結果、クライアントは高価なハードウェアでの優れたモデルアクセスのためにプレミアムを支払いますが、実際には（潜在的に能力の低い）安価なモデルが安価なハードウェアで提供されることがあります。本論文では、入出力の振る舞いに基づいて、（ブラックボックスの）機械学習モデルの基本的なアーキテクチャとソフトウェアスタックを単独で特定する方法である\textbf{ハードウェアおよびソフトウェアプラットフォーム推論（HSPI）}を紹介します。我々の手法は、さまざまなアーキテクチャとコンパイラの固有の違いを活用して、異なるタイプとソフトウェアスタックを区別します。モデルの出力の数値パターンを分析することで、使用されているハードウェアおよび基本的なソフトウェア構成を正確に特定することができる分類フレームワークを提案します。我々の調査結果は、ブラックボックスモデルからタイプを推論することの実現可能性を示しています。HSPIを異なる実際のハードウェアで提供されるモデルに対して評価し、ホワイトボックス設定では、異なるタイプを83.9%から100%の精度で区別することができます。ブラックボックス設定でも、ランダムな推測精度よりも最大3倍高い結果を達成することができます。

English

It is now a common business practice to buy access to large language model (LLM) inference rather than self-host, because of significant upfront hardware infrastructure and energy costs. However, as a buyer, there is no mechanism to verify the authenticity of the advertised service including the serving hardware platform, e.g. that it is actually being served using an NVIDIA H100. Furthermore, there are reports suggesting that model providers may deliver models that differ slightly from the advertised ones, often to make them run on less expensive hardware. That way, a client pays premium for a capable model access on more expensive hardware, yet ends up being served by a (potentially less capable) cheaper model on cheaper hardware. In this paper we introduce \textbf{hardware and software platform inference (HSPI)} -- a method for identifying the underlying architecture and software stack of a (black-box) machine learning model solely based on its input-output behavior. Our method leverages the inherent differences of various architectures and compilers to distinguish between different types and software stacks. By analyzing the numerical patterns in the model's outputs, we propose a classification framework capable of accurately identifying the used for model inference as well as the underlying software configuration. Our findings demonstrate the feasibility of inferring type from black-box models. We evaluate HSPI against models served on different real hardware and find that in a white-box setting we can distinguish between different s with between 83.9% and 100% accuracy. Even in a black-box setting we are able to achieve results that are up to three times higher than random guess accuracy.

ハードウェアおよびソフトウェアプラットフォーム推論

Hardware and Software Platform Inference

要旨

Support