建築後門：用於批次內數據竊取與模型推論操控

摘要

近十年來，學術界對神經網絡中的後門進行了深入研究，主要集中在分類任務上，即攻擊者操縱模型預測。儘管這些改變預測的攻擊明顯具有惡意性，但其在現實世界中的直接影響尚不明確。本文介紹了一類基於架構後門最新進展的新型且更為強大的後門。我們展示了如何專門設計這些後門以利用批處理推理這一常見的硬件利用技術，從而實現大規模用戶數據操縱與竊取。通過針對批處理過程，這些架構後門促進了並發用戶請求之間的信息洩露，並允許攻擊者完全控制同一批次內其他用戶的模型響應。換言之，能夠改變模型架構的攻擊者可以設置並竊取同一批次內其他用戶的模型輸入與輸出。我們證明，此類攻擊不僅可行，而且效果驚人，易於注入主流模型架構，對用戶隱私與系統完整性構成真正惡意的威脅。關鍵在於，為應對這一新型漏洞，我們提出了一種確定性的防禦策略，該策略針對這一新攻擊向量提供了形式化保證，與以往依賴大型語言模型來發現後門的工作不同。我們的防禦策略採用了一種新穎的信息流控制機制，通過分析模型圖並證明同一批次內不同用戶輸入之間的非干擾性。利用這一防禦策略，我們對Hugging Face託管的模型進行了大規模分析，發現超過200個模型因使用動態量化而導致批次條目之間（非故意）的信息洩露。

English

For nearly a decade the academic community has investigated backdoors in neural networks, primarily focusing on classification tasks where adversaries manipulate the model prediction. While demonstrably malicious, the immediate real-world impact of such prediction-altering attacks has remained unclear. In this paper we introduce a novel and significantly more potent class of backdoors that builds upon recent advancements in architectural backdoors. We demonstrate how these backdoors can be specifically engineered to exploit batched inference, a common technique for hardware utilization, enabling large-scale user data manipulation and theft. By targeting the batching process, these architectural backdoors facilitate information leakage between concurrent user requests and allow attackers to fully control model responses directed at other users within the same batch. In other words, an attacker who can change the model architecture can set and steal model inputs and outputs of other users within the same batch. We show that such attacks are not only feasible but also alarmingly effective, can be readily injected into prevalent model architectures, and represent a truly malicious threat to user privacy and system integrity. Critically, to counteract this new class of vulnerabilities, we propose a deterministic mitigation strategy that provides formal guarantees against this new attack vector, unlike prior work that relied on Large Language Models to find the backdoors. Our mitigation strategy employs a novel Information Flow Control mechanism that analyzes the model graph and proves non-interference between different user inputs within the same batch. Using our mitigation strategy we perform a large scale analysis of models hosted through Hugging Face and find over 200 models that introduce (unintended) information leakage between batch entries due to the use of dynamic quantization.

建築後門：用於批次內數據竊取與模型推論操控

Architectural Backdoors for Within-Batch Data Stealing and Model Inference Manipulation

摘要

Support