FlashRT: 프롬프트 인젝션 및 지식 손상에 대한 계산 및 메모리 효율적인 레드팀링 접근법

초록

장문 맥락 대규모 언어 모델(LLM)(예: Gemini-3.1-Pro 및 Qwen-3.5)은 검색 증강 생성, 자율 에이전트, AI 어시스턴트와 같은 많은 실제 애플리케이션을 구동하는 데 널리 사용됩니다. 그러나 프롬프트 주입 및 지식 오염과 같은 위협으로 인해 보안은 광범위한 배포에 있어 주요 관심사로 남아 있습니다. 이러한 위협 하에서 LLM이 직면한 보안 위험을 정량화하기 위해 연구 커뮤니티는 휴리스틱 기반 및 최적화 기반 레드 팀 방법론을 개발해 왔습니다. 최적화 기반 방법은 일반적으로 휴리스틱 공격보다 강력한 공격을 생성하므로 LLM 보안 위험에 대한 보다 엄격한 평가를 제공합니다. 그러나 이러한 방법들은 특히 장문 맥락 시나리오에서 상당한 계산량과 GPU 메모리를 필요로 하는 등 자원 집약적인 경우가 많습니다. 이러한 자원 집약적인 특성은 커뮤니티(특히 학계 연구자)가 장문 맥락 LLM의 보안 위험을 체계적으로 평가하고 대규모로 방어 전략의 효과를 평가하는 데 주요 장애물로 작용합니다. 본 연구에서는 장문 맥락 LLM 하에서 최적화 기반 프롬프트 주입 및 지식 오염 공격의 효율성(계산 및 메모리 측면 모두)을 향상시키는 최초의 프레임워크인 FlashRT를 제안합니다. 광범위한 평가를 통해 FlashRT가 최신 기준선인 nanoGCG 대비 일관되게 2~7배의 속도 향상(예: 런타임을 1시간에서 10분 미만으로 단축)과 2~4배의 GPU 메모리 사용량 감소(예: 32K 토큰 맥락에서 GPU 메모리를 264.1GB에서 65.7GB로 감소)를 제공함을 확인했습니다. FlashRT는 TAP 및 AutoDAN과 같은 블랙박스 최적화 방법에 광범위하게 적용될 수 있습니다. FlashRT가 장문 맥락 LLM 보안의 체계적인 평가를 가능하게 하는 레드 팀 도구로 활용되기를 바랍니다. 코드는 https://github.com/Wang-Yanting/FlashRT 에서 확인할 수 있습니다.

English

Long-context large language models (LLMs)-for example, Gemini-3.1-Pro and Qwen-3.5-are widely used to empower many real-world applications, such as retrieval-augmented generation, autonomous agents, and AI assistants. However, security remains a major concern for their widespread deployment, with threats such as prompt injection and knowledge corruption. To quantify the security risks faced by LLMs under these threats, the research community has developed heuristic-based and optimization-based red-teaming methods. Optimization-based methods generally produce stronger attacks than heuristic attacks and thus provide a more rigorous assessment of LLM security risks. However, they are often resource-intensive, requiring significant computation and GPU memory, especially for long context scenarios. The resource-intensive nature poses a major obstacle for the community (especially academic researchers) to systematically evaluate the security risks of long-context LLMs and assess the effectiveness of defense strategies at scale. In this work, we propose FlashRT, the first framework to improve the efficiency (in terms of both computation and memory) for optimization-based prompt injection and knowledge corruption attacks under long-context LLMs. Through extensive evaluations, we find that FlashRT consistently delivers a 2x-7x speedup (e.g., reducing runtime from one hour to less than ten minutes) and a 2x-4x reduction in GPU memory consumption (e.g., reducing from 264.1 GB to 65.7 GB GPU memory for a 32K token context) compared to state-of-the-art baseline nanoGCG. FlashRT can be broadly applied to black-box optimization methods, such as TAP and AutoDAN. We hope FlashRT can serve as a red-teaming tool to enable systematic evaluation of long-context LLM security. The code is available at: https://github.com/Wang-Yanting/FlashRT

FlashRT: 프롬프트 인젝션 및 지식 손상에 대한 계산 및 메모리 효율적인 레드팀링 접근법

FlashRT: Towards Computationally and Memory Efficient Red-Teaming for Prompt Injection and Knowledge Corruption

초록

Support