최악의 LLM 저작권 침해 완화 인증

초록

대규모 언어 모델(LLM)이 사전 학습 과정에서 저작권이 있는 자료에 노출되면, 배포 후 의도치 않은 저작권 침해 문제가 발생할 우려가 있습니다. 이로 인해 "저작권 삭제" 방법들이 개발되었는데, 이는 모델이 저작권이 있는 콘텐츠와 실질적으로 유사한 내용을 생성하지 못하도록 사후 학습 접근법을 목표로 합니다. 현재의 완화 접근법들은 평균적인 위험에는 어느 정도 효과적이지만, 저작권이 있는 출처에서 길고 그대로 인용된 구절이 존재함으로써 발생하는 최악의 경우의 저작권 위험을 간과하고 있음을 우리는 보여줍니다. 우리는 BloomScrub를 제안하는데, 이는 매우 간단하면서도 고도로 효과적인 추론 시점 접근법으로, 인증된 저작권 삭제를 제공합니다. 우리의 방법은 잠재적으로 침해 가능한 부분을 변환하기 위해 인용 감지와 재작성 기술을 반복적으로 교차 적용합니다. 효율적인 데이터 스케치(Bloom 필터)를 활용함으로써, 우리의 접근법은 대규모 실제 코퍼스에 대해서도 확장 가능한 저작권 검사를 가능하게 합니다. 길이 임계값을 초과하는 인용구를 제거할 수 없는 경우, 시스템은 응답을 자제함으로써 인증된 위험 감소를 제공할 수 있습니다. 실험 결과는 BloomScrub가 침해 위험을 줄이고, 유용성을 보존하며, 적응형 자제를 통해 다양한 수준의 집행 엄격성을 수용할 수 있음을 보여줍니다. 우리의 결과는 경량의 추론 시점 방법이 저작권 방지에 놀랍도록 효과적일 수 있음을 시사합니다.

English

The exposure of large language models (LLMs) to copyrighted material during pre-training raises concerns about unintentional copyright infringement post deployment. This has driven the development of "copyright takedown" methods, post-training approaches aimed at preventing models from generating content substantially similar to copyrighted ones. While current mitigation approaches are somewhat effective for average-case risks, we demonstrate that they overlook worst-case copyright risks exhibits by the existence of long, verbatim quotes from copyrighted sources. We propose BloomScrub, a remarkably simple yet highly effective inference-time approach that provides certified copyright takedown. Our method repeatedly interleaves quote detection with rewriting techniques to transform potentially infringing segments. By leveraging efficient data sketches (Bloom filters), our approach enables scalable copyright screening even for large-scale real-world corpora. When quotes beyond a length threshold cannot be removed, the system can abstain from responding, offering certified risk reduction. Experimental results show that BloomScrub reduces infringement risk, preserves utility, and accommodates different levels of enforcement stringency with adaptive abstention. Our results suggest that lightweight, inference-time methods can be surprisingly effective for copyright prevention.

최악의 LLM 저작권 침해 완화 인증

Certified Mitigation of Worst-Case LLM Copyright Infringement

초록

Support