배치 내 데이터 유출 및 모델 추론 조작을 위한 아키텍처 백도어

초록

약 10년 동안 학계는 주로 모델 예측을 조작하는 적대적 공격에 초점을 맞춘 신경망 백도어를 연구해 왔습니다. 이러한 예측 변경 공격이 명백히 악의적임에도 불구하고, 실제 세계에서의 즉각적인 영향은 여전히 불분명했습니다. 본 논문에서는 최근의 아키텍처 백도어 발전을 기반으로 한 새로운, 그리고 훨씬 더 강력한 백도어 클래스를 소개합니다. 우리는 이러한 백도어가 하드웨어 활용을 위한 일반적인 기술인 배치 추론을 악용하도록 특별히 설계될 수 있음을 보여줍니다. 이를 통해 대규모 사용자 데이터 조작 및 도용이 가능해집니다. 배치 프로세스를 표적으로 함으로써, 이러한 아키텍처 백도어는 동시 사용자 요청 간의 정보 유출을 용이하게 하고, 공격자가 동일한 배치 내 다른 사용자에게 전달되는 모델 응답을 완전히 제어할 수 있게 합니다. 즉, 모델 아키텍처를 변경할 수 있는 공격자는 동일한 배치 내 다른 사용자의 모델 입력과 출력을 설정하고 도용할 수 있습니다. 우리는 이러한 공격이 실행 가능할 뿐만 아니라 놀라울 정도로 효과적이며, 널리 사용되는 모델 아키텍처에 쉽게 주입될 수 있고, 사용자 프라이버시와 시스템 무결성에 진정한 악의적 위협이 됨을 보여줍니다. 무엇보다도, 이러한 새로운 취약점 클래스에 대응하기 위해, 우리는 이전 연구와 달리 대형 언어 모델에 의존하지 않고 이 새로운 공격 벡터에 대해 공식적인 보장을 제공하는 결정론적 완화 전략을 제안합니다. 우리의 완화 전략은 모델 그래프를 분석하고 동일한 배치 내 다른 사용자 입력 간의 비간섭을 증명하는 새로운 정보 흐름 제어 메커니즘을 사용합니다. 이 완화 전략을 사용하여 Hugging Face를 통해 호스팅된 모델을 대규모로 분석한 결과, 동적 양자화 사용으로 인해 배치 항목 간의 (의도치 않은) 정보 유출을 초래하는 200개 이상의 모델을 발견했습니다.

English

For nearly a decade the academic community has investigated backdoors in neural networks, primarily focusing on classification tasks where adversaries manipulate the model prediction. While demonstrably malicious, the immediate real-world impact of such prediction-altering attacks has remained unclear. In this paper we introduce a novel and significantly more potent class of backdoors that builds upon recent advancements in architectural backdoors. We demonstrate how these backdoors can be specifically engineered to exploit batched inference, a common technique for hardware utilization, enabling large-scale user data manipulation and theft. By targeting the batching process, these architectural backdoors facilitate information leakage between concurrent user requests and allow attackers to fully control model responses directed at other users within the same batch. In other words, an attacker who can change the model architecture can set and steal model inputs and outputs of other users within the same batch. We show that such attacks are not only feasible but also alarmingly effective, can be readily injected into prevalent model architectures, and represent a truly malicious threat to user privacy and system integrity. Critically, to counteract this new class of vulnerabilities, we propose a deterministic mitigation strategy that provides formal guarantees against this new attack vector, unlike prior work that relied on Large Language Models to find the backdoors. Our mitigation strategy employs a novel Information Flow Control mechanism that analyzes the model graph and proves non-interference between different user inputs within the same batch. Using our mitigation strategy we perform a large scale analysis of models hosted through Hugging Face and find over 200 models that introduce (unintended) information leakage between batch entries due to the use of dynamic quantization.

배치 내 데이터 유출 및 모델 추론 조작을 위한 아키텍처 백도어

Architectural Backdoors for Within-Batch Data Stealing and Model Inference Manipulation

초록

Support