FlashRT:面向计算与内存高效的红队测试——针对提示注入与知识污染的安全评估
FlashRT: Towards Computationally and Memory Efficient Red-Teaming for Prompt Injection and Knowledge Corruption
April 30, 2026
作者: Yanting Wang, Chenlong Yin, Ying Chen, Jinyuan Jia
cs.AI
摘要
长上下文大语言模型(LLMs)——例如Gemini-3.1-Pro与Qwen-3.5——正被广泛应用于检索增强生成、自主智能体和AI助手等现实场景。然而,其大规模部署仍面临严重的安全隐忧,包括提示注入与知识污染等威胁。为量化LLMs在此类威胁下的安全风险,研究界已开发出基于启发式与基于优化的红队测试方法。基于优化的方法通常能产生比启发式攻击更强的对抗样本,从而为LLM安全风险提供更严苛的评估。但这类方法往往需要消耗大量计算资源与GPU显存,尤其在长上下文场景下更为显著。这种高资源消耗特性成为系统化评估长上下文LLM安全风险及大规模检验防御策略效果的主要障碍(对学术研究者尤为突出)。本研究提出FlashRT框架,首次实现了长上下文LLMs下基于优化的提示注入与知识污染攻击在计算效率和内存占用方面的双重提升。经广泛测试表明,相较于最先进的基线方法nanoGCG,FlashRT可持续实现2-7倍加速(例如将运行时间从1小时缩短至10分钟内),并降低2-4倍GPU显存消耗(针对32K词元上下文,显存占用从264.1GB降至65.7GB)。该框架可广泛应用于TAP、AutoDAN等黑盒优化方法。我们期待FlashRT能作为红队测试工具,助力长上下文LLM安全性的系统化评估。代码已开源:https://github.com/Wang-Yanting/FlashRT
English
Long-context large language models (LLMs)-for example, Gemini-3.1-Pro and Qwen-3.5-are widely used to empower many real-world applications, such as retrieval-augmented generation, autonomous agents, and AI assistants. However, security remains a major concern for their widespread deployment, with threats such as prompt injection and knowledge corruption. To quantify the security risks faced by LLMs under these threats, the research community has developed heuristic-based and optimization-based red-teaming methods. Optimization-based methods generally produce stronger attacks than heuristic attacks and thus provide a more rigorous assessment of LLM security risks. However, they are often resource-intensive, requiring significant computation and GPU memory, especially for long context scenarios. The resource-intensive nature poses a major obstacle for the community (especially academic researchers) to systematically evaluate the security risks of long-context LLMs and assess the effectiveness of defense strategies at scale. In this work, we propose FlashRT, the first framework to improve the efficiency (in terms of both computation and memory) for optimization-based prompt injection and knowledge corruption attacks under long-context LLMs. Through extensive evaluations, we find that FlashRT consistently delivers a 2x-7x speedup (e.g., reducing runtime from one hour to less than ten minutes) and a 2x-4x reduction in GPU memory consumption (e.g., reducing from 264.1 GB to 65.7 GB GPU memory for a 32K token context) compared to state-of-the-art baseline nanoGCG. FlashRT can be broadly applied to black-box optimization methods, such as TAP and AutoDAN. We hope FlashRT can serve as a red-teaming tool to enable systematic evaluation of long-context LLM security. The code is available at: https://github.com/Wang-Yanting/FlashRT