ChatPaper.aiChatPaper

**Sui-1:基于事实核查的长文本摘要生成系统**

sui-1: Grounded and Verifiable Long-Form Summarization

January 13, 2026
作者: Benedikt Droste, Jan Philipp Harries, Maximilian Idahl, Björn Plüster
cs.AI

摘要

大型语言模型常生成看似合理但无法对照原文验证的摘要,这在政府和法律分析等合规敏感领域存在严重局限。我们推出sui-1模型——一个具有240亿参数的模型,能生成带行内引用的摘要式摘要,使用户可追溯每个论断的原文依据。通过结合思维链提示与多阶段验证的合成数据流水线,我们从议会文件、网络文本和维基百科等多元来源中,生成涵盖五种语言的超过2.2万个高质量训练样本。评估表明,sui-1显著优于所有测试的开放权重基线模型(包括参数量三倍于它的模型)。这些结果证明,在基于引用的摘要任务中,针对性训练远比单纯扩大模型规模更有效。模型权重及交互演示已公开发布。
English
Large language models frequently generate plausible but unfaithful summaries that users cannot verify against source text, a critical limitation in compliance-sensitive domains such as government and legal analysis. We present sui-1, a 24B parameter model that produces abstractive summaries with inline citations, enabling users to trace each claim to its source sentence. Our synthetic data pipeline combines chain-of-thought prompting with multi-stage verification, generating over 22,000 high-quality training examples across five languages from diverse sources including parliamentary documents, web text, and Wikipedia. Evaluation shows sui-1 significantly outperforms all tested open-weight baselines, including models with 3x more parameters. These results demonstrate that task-specific training substantially outperforms scale alone for citation-grounded summarization. Model weights and an interactive demo are publicly available.
PDF01January 16, 2026