利用指令遵循檢索器進行惡意資訊檢索

摘要

指令遵循型检索器已与大型语言模型（LLM）一同广泛应用于现实世界的应用中，但鲜有研究探讨其日益增强的搜索能力所带来的安全风险。我们通过实证研究，探讨了检索器在直接使用及在基于检索增强生成（RAG）的配置下满足恶意查询的能力。具体而言，我们调查了包括NV-Embed和LLM2Vec在内的六种领先检索器，发现面对恶意请求时，大多数检索器能够（针对超过50%的查询）筛选出相关的有害段落。例如，LLM2Vec在我们的恶意查询中正确选择了61.35%的段落。我们进一步揭示了指令遵循型检索器的一个新兴风险，即通过利用其指令遵循能力，可以暴露出高度相关的有害信息。最后，我们展示，即使是经过安全对齐的LLM，如Llama3，在上下文中提供有害检索段落的情况下，也能满足恶意请求。总之，我们的发现强调了随着检索器能力提升而伴随的恶意滥用风险。

English

Instruction-following retrievers have been widely adopted alongside LLMs in real-world applications, but little work has investigated the safety risks surrounding their increasing search capabilities. We empirically study the ability of retrievers to satisfy malicious queries, both when used directly and when used in a retrieval augmented generation-based setup. Concretely, we investigate six leading retrievers, including NV-Embed and LLM2Vec, and find that given malicious requests, most retrievers can (for >50% of queries) select relevant harmful passages. For example, LLM2Vec correctly selects passages for 61.35% of our malicious queries. We further uncover an emerging risk with instruction-following retrievers, where highly relevant harmful information can be surfaced by exploiting their instruction-following capabilities. Finally, we show that even safety-aligned LLMs, such as Llama3, can satisfy malicious requests when provided with harmful retrieved passages in-context. In summary, our findings underscore the malicious misuse risks associated with increasing retriever capability.

利用指令遵循檢索器進行惡意資訊檢索

Exploiting Instruction-Following Retrievers for Malicious Information Retrieval

摘要

Support