FlashRT: プロンプトインジェクションおよび知識破損に対する計算効率とメモリ効率に優れたレッドチーミングの実現に向けて

要旨

長文脈大規模言語モデル（LLM）、例えばGemini-3.1-ProやQwen-3.5などは、検索拡張生成、自律エージェント、AIアシスタントなど、多くの実世界アプリケーションを強化するために広く利用されている。しかし、その広範な展開においては、プロンプトインジェクションや知識破損などの脅威により、セキュリティが主要な懸念事項となっている。これらの脅威下でのLLMが直面するセキュリティリスクを定量化するため、研究コミュニティはヒューリスティックベースおよび最適化ベースのレッドチーミング手法を開発してきた。最適化ベースの手法は、一般にヒューリスティック攻撃よりも強力な攻撃を生成するため、LLMのセキュリティリスクをより厳密に評価できる。しかし、これらの手法はしばしばリソース集約的であり、特に長文脈シナリオでは、多大な計算量とGPUメモリを必要とする。このリソース集約的な性質は、コミュニティ（特に学術研究者）が長文脈LLMのセキュリティリスクを体系的に評価し、防御戦略の有効性を大規模に検証する上で大きな障壁となっている。本研究では、長文脈LLMにおける最適化ベースのプロンプトインジェクションおよび知識破損攻撃の効率（計算量とメモリ使用量の両面）を改善する初のフレームワークであるFlashRTを提案する。大規模な評価を通じて、FlashRTは最先端のベースラインであるnanoGCGと比較して、一貫して2倍から7倍の高速化（例えば、実行時間を1時間から10分未満に短縮）と、GPUメモリ消費量を2倍から4倍削減（例えば、32Kトークンの文脈で264.1 GBから65.7 GBへ削減）できることを確認した。FlashRTはTAPやAutoDANなどのブラックボックス最適化手法にも広く適用可能である。FlashRTがレッドチーミングツールとして、長文脈LLMのセキュリティを体系的に評価することを可能にすることを期待する。コードは以下で公開されている：https://github.com/Wang-Yanting/FlashRT

English

Long-context large language models (LLMs)-for example, Gemini-3.1-Pro and Qwen-3.5-are widely used to empower many real-world applications, such as retrieval-augmented generation, autonomous agents, and AI assistants. However, security remains a major concern for their widespread deployment, with threats such as prompt injection and knowledge corruption. To quantify the security risks faced by LLMs under these threats, the research community has developed heuristic-based and optimization-based red-teaming methods. Optimization-based methods generally produce stronger attacks than heuristic attacks and thus provide a more rigorous assessment of LLM security risks. However, they are often resource-intensive, requiring significant computation and GPU memory, especially for long context scenarios. The resource-intensive nature poses a major obstacle for the community (especially academic researchers) to systematically evaluate the security risks of long-context LLMs and assess the effectiveness of defense strategies at scale. In this work, we propose FlashRT, the first framework to improve the efficiency (in terms of both computation and memory) for optimization-based prompt injection and knowledge corruption attacks under long-context LLMs. Through extensive evaluations, we find that FlashRT consistently delivers a 2x-7x speedup (e.g., reducing runtime from one hour to less than ten minutes) and a 2x-4x reduction in GPU memory consumption (e.g., reducing from 264.1 GB to 65.7 GB GPU memory for a 32K token context) compared to state-of-the-art baseline nanoGCG. FlashRT can be broadly applied to black-box optimization methods, such as TAP and AutoDAN. We hope FlashRT can serve as a red-teaming tool to enable systematic evaluation of long-context LLM security. The code is available at: https://github.com/Wang-Yanting/FlashRT

FlashRT: プロンプトインジェクションおよび知識破損に対する計算効率とメモリ効率に優れたレッドチーミングの実現に向けて

FlashRT: Towards Computationally and Memory Efficient Red-Teaming for Prompt Injection and Knowledge Corruption

要旨

Support