악의적인 정보 검색을 위한 명령어-따라기 검색기 활용

초록

실제 애플리케이션에서 LLM(Large Language Models)과 함께 명령어 수행 검색기(instruction-following retrievers)가 널리 사용되고 있지만, 이들의 검색 능력이 증가함에 따른 안전성 위험에 대한 연구는 거의 이루어지지 않았습니다. 본 연구에서는 검색기가 악의적인 쿼리를 충족시키는 능력을 실증적으로 분석하며, 이를 직접 사용할 때와 검색 강화 생성(retrieval augmented generation) 기반 설정에서 사용할 때 모두를 고려합니다. 구체적으로, NV-Embed와 LLM2Vec을 포함한 6개의 주요 검색기를 조사한 결과, 악의적인 요청이 주어졌을 때 대부분의 검색기(>50%의 쿼리에서)가 관련된 유해한 문서를 선택할 수 있음을 발견했습니다. 예를 들어, LLM2Vec은 악의적인 쿼리의 61.35%에 대해 정확한 문서를 선택했습니다. 또한, 명령어 수행 검색기의 새로운 위험 요소를 발견했는데, 이는 명령어 수행 능력을 악용하여 고도로 관련된 유해 정보를 노출시킬 수 있다는 점입니다. 마지막으로, Llama3과 같은 안전성이 강화된 LLM조차도 컨텍스트 내에서 유해한 검색 결과가 제공되면 악의적인 요청을 충족시킬 수 있음을 보여줍니다. 요약하자면, 본 연구 결과는 검색기 능력 증가와 관련된 악의적 오용 위험을 강조합니다.

English

Instruction-following retrievers have been widely adopted alongside LLMs in real-world applications, but little work has investigated the safety risks surrounding their increasing search capabilities. We empirically study the ability of retrievers to satisfy malicious queries, both when used directly and when used in a retrieval augmented generation-based setup. Concretely, we investigate six leading retrievers, including NV-Embed and LLM2Vec, and find that given malicious requests, most retrievers can (for >50% of queries) select relevant harmful passages. For example, LLM2Vec correctly selects passages for 61.35% of our malicious queries. We further uncover an emerging risk with instruction-following retrievers, where highly relevant harmful information can be surfaced by exploiting their instruction-following capabilities. Finally, we show that even safety-aligned LLMs, such as Llama3, can satisfy malicious requests when provided with harmful retrieved passages in-context. In summary, our findings underscore the malicious misuse risks associated with increasing retriever capability.

악의적인 정보 검색을 위한 명령어-따라기 검색기 활용

Exploiting Instruction-Following Retrievers for Malicious Information Retrieval

초록

Support