弗兰肯文本:将随机文本片段拼接成长篇叙事
Frankentext: Stitching random text fragments into long-form narratives
May 23, 2025
作者: Chau Minh Pham, Jenna Russell, Dzung Pham, Mohit Iyyer
cs.AI
摘要
我们提出了一种新型长篇叙事形式——弗兰肯文本,这是在极端约束条件下由大型语言模型生成的:大部分词汇(例如90%)必须逐字复制自人类作品。这一任务对可控生成提出了严峻挑战,要求模型既要满足写作提示,又要整合分散的文本片段,同时还要保持叙事的连贯性。为生成弗兰肯文本,我们指导模型通过选择和组合人类撰写的段落来起草初稿,随后在维持用户指定复制比例的前提下,对初稿进行迭代修订。我们从三个维度评估生成的弗兰肯文本:写作质量、指令遵循度及可检测性。Gemini-2.5-Pro在此任务中表现惊人:其81%的弗兰肯文本连贯且100%符合提示要求。尤为值得注意的是,高达59%的输出被如Pangram等检测器误判为人类创作,揭示了AI文本检测器的局限性。人类评审员有时能通过文本中突兀的语气转换和段落间不一致的语法识别出弗兰肯文本,尤其是在较长的生成内容中。除了作为一项具有挑战性的生成任务外,弗兰肯文本还引发了关于如何构建有效检测器以应对这一新的作者身份灰色地带的讨论,为混合作者身份检测提供了训练数据,并作为研究人机协作写作过程的实验平台。
English
We introduce Frankentexts, a new type of long-form narratives produced by
LLMs under the extreme constraint that most tokens (e.g., 90%) must be copied
verbatim from human writings. This task presents a challenging test of
controllable generation, requiring models to satisfy a writing prompt,
integrate disparate text fragments, and still produce a coherent narrative. To
generate Frankentexts, we instruct the model to produce a draft by selecting
and combining human-written passages, then iteratively revise the draft while
maintaining a user-specified copy ratio. We evaluate the resulting Frankentexts
along three axes: writing quality, instruction adherence, and detectability.
Gemini-2.5-Pro performs surprisingly well on this task: 81% of its Frankentexts
are coherent and 100% relevant to the prompt. Notably, up to 59% of these
outputs are misclassified as human-written by detectors like Pangram, revealing
limitations in AI text detectors. Human annotators can sometimes identify
Frankentexts through their abrupt tone shifts and inconsistent grammar between
segments, especially in longer generations. Beyond presenting a challenging
generation task, Frankentexts invite discussion on building effective detectors
for this new grey zone of authorship, provide training data for mixed
authorship detection, and serve as a sandbox for studying human-AI co-writing
processes.Summary
AI-Generated Summary