강건하고 세밀한 AI 생성 텍스트 탐지

초록

기계 생성 콘텐츠를 위한 이상적인 탐지 시스템은 더욱 진보된 대형 언어 모델(LLM)들이 날로 등장함에 따라 어떤 생성기에서도 잘 작동해야 합니다. 기존 시스템들은 짧은 텍스트에서 AI 생성 콘텐츠를 정확히 식별하는 데 어려움을 겪는 경우가 많습니다. 또한, 모든 텍스트가 완전히 인간이나 LLM에 의해 작성된 것은 아니므로, 우리는 인간과 LLM이 공동으로 작성한 텍스트와 같은 부분적인 경우에 더 초점을 맞췄습니다. 본 논문에서는 토큰 분류 작업을 위해 설계된 일련의 모델들을 소개하며, 이 모델들은 인간과 기계가 공동으로 작성한 방대한 텍스트 컬렉션으로 훈련되었고, 보지 못한 도메인의 텍스트, 보지 못한 생성기의 텍스트, 비원어민이 작성한 텍스트, 그리고 적대적 입력이 포함된 텍스트에서도 우수한 성능을 보였습니다. 또한, 우리는 23개 언어에 걸쳐 여러 인기 있는 독점 LLM들이 주로 공동으로 작성한 240만 개 이상의 텍스트로 구성된 새로운 데이터셋을 소개합니다. 우리는 또한 각 도메인과 생성기의 텍스트에 대한 모델 성능 결과를 제시합니다. 추가적인 결과로는 각 적대적 방법에 대한 성능 비교, 입력 텍스트의 길이, 그리고 원본 인간 작성 텍스트와 비교한 생성 텍스트의 특성 등이 포함됩니다.

English

An ideal detection system for machine generated content is supposed to work well on any generator as many more advanced LLMs come into existence day by day. Existing systems often struggle with accurately identifying AI-generated content over shorter texts. Further, not all texts might be entirely authored by a human or LLM, hence we focused more over partial cases i.e human-LLM co-authored texts. Our paper introduces a set of models built for the task of token classification which are trained on an extensive collection of human-machine co-authored texts, which performed well over texts of unseen domains, unseen generators, texts by non-native speakers and those with adversarial inputs. We also introduce a new dataset of over 2.4M such texts mostly co-authored by several popular proprietary LLMs over 23 languages. We also present findings of our models' performance over each texts of each domain and generator. Additional findings include comparison of performance against each adversarial method, length of input texts and characteristics of generated texts compared to the original human authored texts.