ClaimGen-CN：一個大規模的中文法律主張生成數據集

摘要

法律訴求指的是案件中原告的主張，對於引導司法推理和案件解決至關重要。儘管許多研究致力於提升法律專業人士的工作效率，但針對幫助非專業人士（如原告）的研究仍屬空白。本文探討了基於給定案件事實生成法律訴求的問題。首先，我們從多種現實法律糾紛中構建了ClaimGen-CN，這是首個用於中文法律訴求生成任務的數據集。此外，我們設計了一種專門用於評估生成訴求的指標，該指標涵蓋了兩個關鍵維度：事實性和清晰性。基於此，我們對當前最先進的通用及法律領域大型語言模型進行了全面的零樣本評估。我們的研究結果凸顯了現有模型在事實精確性和表達清晰性方面的局限，表明這一領域需要更有針對性的發展。為鼓勵對這一重要任務的進一步探索，我們將公開該數據集。

English

Legal claims refer to the plaintiff's demands in a case and are essential to guiding judicial reasoning and case resolution. While many works have focused on improving the efficiency of legal professionals, the research on helping non-professionals (e.g., plaintiffs) remains unexplored. This paper explores the problem of legal claim generation based on the given case's facts. First, we construct ClaimGen-CN, the first dataset for Chinese legal claim generation task, from various real-world legal disputes. Additionally, we design an evaluation metric tailored for assessing the generated claims, which encompasses two essential dimensions: factuality and clarity. Building on this, we conduct a comprehensive zero-shot evaluation of state-of-the-art general and legal-domain large language models. Our findings highlight the limitations of the current models in factual precision and expressive clarity, pointing to the need for more targeted development in this domain. To encourage further exploration of this important task, we will make the dataset publicly available.