ClaimGen-CN：一个面向法律声明生成的大规模中文数据集

摘要

法律诉求指的是案件中原告的主张，对于引导司法推理和案件解决至关重要。尽管许多研究致力于提高法律专业人士的工作效率，但针对非专业人士（如原告）的辅助研究仍处于空白。本文探讨了基于给定案件事实生成法律诉求的问题。首先，我们从各类现实法律纠纷中构建了ClaimGen-CN，这是首个面向中文法律诉求生成任务的数据集。此外，我们设计了一套专门用于评估生成诉求的指标，涵盖事实准确性和表达清晰度两个核心维度。在此基础上，我们对当前最先进的通用及法律领域大语言模型进行了全面的零样本评估。研究结果揭示了现有模型在事实精确性和表达清晰度方面的局限，表明该领域需要更具针对性的发展。为促进这一重要任务的进一步探索，我们将公开该数据集。

English

Legal claims refer to the plaintiff's demands in a case and are essential to guiding judicial reasoning and case resolution. While many works have focused on improving the efficiency of legal professionals, the research on helping non-professionals (e.g., plaintiffs) remains unexplored. This paper explores the problem of legal claim generation based on the given case's facts. First, we construct ClaimGen-CN, the first dataset for Chinese legal claim generation task, from various real-world legal disputes. Additionally, we design an evaluation metric tailored for assessing the generated claims, which encompasses two essential dimensions: factuality and clarity. Building on this, we conduct a comprehensive zero-shot evaluation of state-of-the-art general and legal-domain large language models. Our findings highlight the limitations of the current models in factual precision and expressive clarity, pointing to the need for more targeted development in this domain. To encourage further exploration of this important task, we will make the dataset publicly available.