프랑켄텍스트: 무작위 텍스트 조각을 장편 서사로 엮기

초록

우리는 인간이 작성한 글의 대부분의 토큰(예: 90%)을 그대로 복사해야 한다는 극단적인 제약 하에서 대형 언어 모델(LLM)이 생성하는 새로운 유형의 장편 서사인 '프랑켄텍스트(Frankentexts)'를 소개한다. 이 작업은 쓰기 프롬프트를 충족하고, 서로 다른 텍스트 조각을 통합하며, 여전히 일관된 서사를 만들어내야 하는 제어 가능한 생성의 어려운 테스트를 제시한다. 프랑켄텍스트를 생성하기 위해, 우리는 모델이 인간이 작성한 구절을 선택하고 결합하여 초안을 작성하도록 지시한 다음, 사용자가 지정한 복사 비율을 유지하면서 초안을 반복적으로 수정한다. 생성된 프랑켄텍스트는 쓰기 품질, 지시 준수, 탐지 가능성이라는 세 가지 축을 따라 평가된다. Gemini-2.5-Pro는 이 작업에서 놀라울 정도로 잘 수행되었는데, 프랑켄텍스트의 81%가 일관성이 있고 100%가 프롬프트와 관련이 있었다. 특히, 이러한 출력물의 최대 59%가 Pangram과 같은 탐지기에 의해 인간이 작성한 것으로 오분류되어, AI 텍스트 탐지기의 한계를 드러냈다. 인간 평가자는 특히 더 긴 생성물에서 갑작스러운 어조 변화와 구간 간 일관성 없는 문법을 통해 프랑켄텍스트를 식별할 수 있다. 프랑켄텍스트는 도전적인 생성 작업을 제시하는 것 외에도, 이 새로운 저작권의 회색 지대에 대한 효과적인 탐지기를 구축하는 논의를 촉발하고, 혼합 저작권 탐지를 위한 훈련 데이터를 제공하며, 인간-AI 공동 작성 과정을 연구하기 위한 샌드박스 역할을 한다.

English

We introduce Frankentexts, a new type of long-form narratives produced by LLMs under the extreme constraint that most tokens (e.g., 90%) must be copied verbatim from human writings. This task presents a challenging test of controllable generation, requiring models to satisfy a writing prompt, integrate disparate text fragments, and still produce a coherent narrative. To generate Frankentexts, we instruct the model to produce a draft by selecting and combining human-written passages, then iteratively revise the draft while maintaining a user-specified copy ratio. We evaluate the resulting Frankentexts along three axes: writing quality, instruction adherence, and detectability. Gemini-2.5-Pro performs surprisingly well on this task: 81% of its Frankentexts are coherent and 100% relevant to the prompt. Notably, up to 59% of these outputs are misclassified as human-written by detectors like Pangram, revealing limitations in AI text detectors. Human annotators can sometimes identify Frankentexts through their abrupt tone shifts and inconsistent grammar between segments, especially in longer generations. Beyond presenting a challenging generation task, Frankentexts invite discussion on building effective detectors for this new grey zone of authorship, provide training data for mixed authorship detection, and serve as a sandbox for studying human-AI co-writing processes.

프랑켄텍스트: 무작위 텍스트 조각을 장편 서사로 엮기

Frankentext: Stitching random text fragments into long-form narratives

초록

Support