당신은 돈을 내고 얻는 것에 대해 확신하고 있나요? LLM API의 모델 대체 감사

초록

블랙박스 API를 통해 접근 가능한 대규모 언어 모델(LLM)의 확산은 중요한 신뢰 문제를 야기합니다: 사용자는 광고된 모델의 능력(예: 크기, 성능)을 기반으로 서비스에 대한 비용을 지불하지만, 제공자는 운영 비용을 절감하기 위해 지정된 모델을 더 저렴하고 낮은 품질의 대안으로 은밀히 대체할 수 있습니다. 이러한 투명성의 부족은 공정성을 훼손하고 신뢰를 약화시키며 신뢰할 수 있는 벤치마킹을 복잡하게 만듭니다. 블랙박스 특성으로 인해 이러한 대체를 감지하는 것은 어렵고, 일반적으로 입력-출력 쿼리로 상호작용이 제한됩니다. 본 논문은 LLM API에서의 모델 대체 감지 문제를 공식화합니다. 우리는 모델 양자화, 무작위 대체, 벤치마크 회피와 같은 다양한 현실적인 공격 시나리오 하에서 출력 기반 통계 테스트, 벤치마크 평가, 로그 확률 분석을 포함한 기존 검증 기술을 체계적으로 평가합니다. 연구 결과, 특히 미묘하거나 적응형 공격에 대해 텍스트 출력에만 의존하는 방법의 한계를 밝혔습니다. 로그 확률 분석은 사용 가능할 때 더 강력한 보장을 제공하지만, 그 접근성은 종종 제한적입니다. 우리는 신뢰 실행 환경(TEE)과 같은 하드웨어 기반 솔루션의 잠재력을 논의하며, 보안, 성능, 제공자 채택 간의 균형을 강조하며 검증 가능한 모델 무결성을 향한 길을 제시합니다. 코드는 https://github.com/sunblaze-ucb/llm-api-audit에서 확인할 수 있습니다.

English

The proliferation of Large Language Models (LLMs) accessed via black-box APIs introduces a significant trust challenge: users pay for services based on advertised model capabilities (e.g., size, performance), but providers may covertly substitute the specified model with a cheaper, lower-quality alternative to reduce operational costs. This lack of transparency undermines fairness, erodes trust, and complicates reliable benchmarking. Detecting such substitutions is difficult due to the black-box nature, typically limiting interaction to input-output queries. This paper formalizes the problem of model substitution detection in LLM APIs. We systematically evaluate existing verification techniques, including output-based statistical tests, benchmark evaluations, and log probability analysis, under various realistic attack scenarios like model quantization, randomized substitution, and benchmark evasion. Our findings reveal the limitations of methods relying solely on text outputs, especially against subtle or adaptive attacks. While log probability analysis offers stronger guarantees when available, its accessibility is often limited. We conclude by discussing the potential of hardware-based solutions like Trusted Execution Environments (TEEs) as a pathway towards provable model integrity, highlighting the trade-offs between security, performance, and provider adoption. Code is available at https://github.com/sunblaze-ucb/llm-api-audit

당신은 돈을 내고 얻는 것에 대해 확신하고 있나요? LLM API의 모델 대체 감사

Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs

초록

Support