在LLM时代的作者归属问题:问题、方法论和挑战
Authorship Attribution in the Era of LLMs: Problems, Methodologies, and Challenges
August 16, 2024
作者: Baixiang Huang, Canyu Chen, Kai Shu
cs.AI
摘要
准确的作者归属对于维护数字内容的完整性、改善取证调查,并减轻误导和抄袭的风险至关重要。解决正确作者归属的迫切需求对于维护真实作者的可信度和责任是至关重要的。大型语言模型(LLMs)的快速发展已经模糊了人类和机器作者之间的界限,给传统方法带来了重大挑战。我们提出了一项全面的文献综述,审视了LLMs时代作者归属研究的最新进展。该调查通过对这一领域的四个代表性问题进行分类,系统地探索了这一领域的现状:(1)人类撰写文本归属;(2)LLM生成文本检测;(3)LLM生成文本归属;以及(4)人类-LLM共同撰写文本归属。我们还讨论了与确保作者归属方法的泛化性和可解释性相关的挑战。泛化性要求能够跨越各种领域进行泛化,而可解释性强调提供透明且可理解的洞察力,解释这些模型所做决策的原因。通过评估现有方法和基准的优势和局限性,我们确定了该领域的关键开放问题和未来研究方向。这项文献综述为对这一快速发展领域的现状感兴趣的研究人员和从业者提供了一份路线图。其他资源和精选论文列表可在https://llm-authorship.github.io 上找到并定期更新。
English
Accurate attribution of authorship is crucial for maintaining the integrity
of digital content, improving forensic investigations, and mitigating the risks
of misinformation and plagiarism. Addressing the imperative need for proper
authorship attribution is essential to uphold the credibility and
accountability of authentic authorship. The rapid advancements of Large
Language Models (LLMs) have blurred the lines between human and machine
authorship, posing significant challenges for traditional methods. We presents
a comprehensive literature review that examines the latest research on
authorship attribution in the era of LLMs. This survey systematically explores
the landscape of this field by categorizing four representative problems: (1)
Human-written Text Attribution; (2) LLM-generated Text Detection; (3)
LLM-generated Text Attribution; and (4) Human-LLM Co-authored Text Attribution.
We also discuss the challenges related to ensuring the generalization and
explainability of authorship attribution methods. Generalization requires the
ability to generalize across various domains, while explainability emphasizes
providing transparent and understandable insights into the decisions made by
these models. By evaluating the strengths and limitations of existing methods
and benchmarks, we identify key open problems and future research directions in
this field. This literature review serves a roadmap for researchers and
practitioners interested in understanding the state of the art in this rapidly
evolving field. Additional resources and a curated list of papers are available
and regularly updated at https://llm-authorship.github.ioSummary
AI-Generated Summary