架构后门：用于批量内数据窃取与模型推理操控

摘要

近十年来，学术界一直在研究神经网络中的后门问题，主要集中在分类任务上，即攻击者如何操控模型预测。尽管这些改变预测的攻击明显具有恶意，但其对现实世界的直接影响尚不明确。本文提出了一类新颖且更为强大的后门攻击，基于近期在架构后门方面的进展。我们展示了如何专门设计这些后门以利用批处理推理这一常见的硬件利用技术，从而实现大规模用户数据操纵和窃取。通过针对批处理过程，这些架构后门促进了并发用户请求间的信息泄露，并允许攻击者完全控制同一批次内其他用户的模型响应。换言之，能够改变模型架构的攻击者可以设置并窃取同一批次内其他用户的模型输入和输出。我们证明此类攻击不仅可行，而且效果惊人，能够轻易注入到流行模型架构中，对用户隐私和系统完整性构成了真正的恶意威胁。至关重要的是，为应对这一新型漏洞，我们提出了一种确定性缓解策略，提供了针对这一新攻击向量的正式保证，与之前依赖大型语言模型发现后门的工作不同。我们的缓解策略采用了一种新颖的信息流控制机制，通过分析模型图并证明同一批次内不同用户输入间的无干扰性。利用我们的缓解策略，我们对托管在Hugging Face上的模型进行了大规模分析，发现超过200个模型由于使用动态量化而引入了（非预期的）批次条目间信息泄露。

English

For nearly a decade the academic community has investigated backdoors in neural networks, primarily focusing on classification tasks where adversaries manipulate the model prediction. While demonstrably malicious, the immediate real-world impact of such prediction-altering attacks has remained unclear. In this paper we introduce a novel and significantly more potent class of backdoors that builds upon recent advancements in architectural backdoors. We demonstrate how these backdoors can be specifically engineered to exploit batched inference, a common technique for hardware utilization, enabling large-scale user data manipulation and theft. By targeting the batching process, these architectural backdoors facilitate information leakage between concurrent user requests and allow attackers to fully control model responses directed at other users within the same batch. In other words, an attacker who can change the model architecture can set and steal model inputs and outputs of other users within the same batch. We show that such attacks are not only feasible but also alarmingly effective, can be readily injected into prevalent model architectures, and represent a truly malicious threat to user privacy and system integrity. Critically, to counteract this new class of vulnerabilities, we propose a deterministic mitigation strategy that provides formal guarantees against this new attack vector, unlike prior work that relied on Large Language Models to find the backdoors. Our mitigation strategy employs a novel Information Flow Control mechanism that analyzes the model graph and proves non-interference between different user inputs within the same batch. Using our mitigation strategy we perform a large scale analysis of models hosted through Hugging Face and find over 200 models that introduce (unintended) information leakage between batch entries due to the use of dynamic quantization.

架构后门：用于批量内数据窃取与模型推理操控

Architectural Backdoors for Within-Batch Data Stealing and Model Inference Manipulation

摘要

Support