FlashRT：面向计算与内存高效的红队测试——针对提示注入与知识污染的研究

摘要

长上下文大语言模型（LLMs）——例如Gemini-3.1-Pro与Qwen-3.5——正被广泛应用于检索增强生成、自主智能体和AI助手等现实场景。然而，其大规模部署仍面临严重的安全隐患，包括提示注入与知识污染等威胁。为量化LLMs在此类威胁下的安全风险，研究界已开发出基于启发式算法和优化算法的红队测试方法。优化类方法通常能产生比启发式攻击更强的攻击效果，从而为LLM安全风险提供更严苛的评估标准。但这类方法往往需要消耗大量计算资源与GPU显存，尤其在长上下文场景下更为突出。这种资源密集型特性成为系统化评估长上下文LLM安全风险及大规模验证防御策略效果的主要障碍（对学术界研究者尤为显著）。本研究提出FlashRT框架，首次针对长上下文LLM场景下的优化类提示注入与知识污染攻击，实现了计算效率与内存效率的双重提升。大量实验表明，相较于最先进的基线方法nanoGCG，FlashRT可稳定实现2-7倍加速（例如将运行时间从1小时缩短至10分钟内），并将GPU显存占用降低2-4倍（针对32K令牌上下文，显存从264.1GB降至65.7GB）。该框架可广泛应用于TAP、AutoDAN等黑盒优化方法。我们期待FlashRT能作为红队测试工具，助力长上下文LLM安全性的系统化评估。代码已开源：https://github.com/Wang-Yanting/FlashRT

English

Long-context large language models (LLMs)-for example, Gemini-3.1-Pro and Qwen-3.5-are widely used to empower many real-world applications, such as retrieval-augmented generation, autonomous agents, and AI assistants. However, security remains a major concern for their widespread deployment, with threats such as prompt injection and knowledge corruption. To quantify the security risks faced by LLMs under these threats, the research community has developed heuristic-based and optimization-based red-teaming methods. Optimization-based methods generally produce stronger attacks than heuristic attacks and thus provide a more rigorous assessment of LLM security risks. However, they are often resource-intensive, requiring significant computation and GPU memory, especially for long context scenarios. The resource-intensive nature poses a major obstacle for the community (especially academic researchers) to systematically evaluate the security risks of long-context LLMs and assess the effectiveness of defense strategies at scale. In this work, we propose FlashRT, the first framework to improve the efficiency (in terms of both computation and memory) for optimization-based prompt injection and knowledge corruption attacks under long-context LLMs. Through extensive evaluations, we find that FlashRT consistently delivers a 2x-7x speedup (e.g., reducing runtime from one hour to less than ten minutes) and a 2x-4x reduction in GPU memory consumption (e.g., reducing from 264.1 GB to 65.7 GB GPU memory for a 32K token context) compared to state-of-the-art baseline nanoGCG. FlashRT can be broadly applied to black-box optimization methods, such as TAP and AutoDAN. We hope FlashRT can serve as a red-teaming tool to enable systematic evaluation of long-context LLM security. The code is available at: https://github.com/Wang-Yanting/FlashRT