**Sui-1：基于事实核查的长文本摘要生成系统**

摘要

大型语言模型常生成看似合理但无法对照原文验证的摘要，这在政府和法律分析等合规敏感领域存在严重局限。我们推出sui-1模型——一个具有240亿参数的模型，能生成带行内引用的摘要式摘要，使用户可追溯每个论断的原文依据。通过结合思维链提示与多阶段验证的合成数据流水线，我们从议会文件、网络文本和维基百科等多元来源中，生成涵盖五种语言的超过2.2万个高质量训练样本。评估表明，sui-1显著优于所有测试的开放权重基线模型（包括参数量三倍于它的模型）。这些结果证明，在基于引用的摘要任务中，针对性训练远比单纯扩大模型规模更有效。模型权重及交互演示已公开发布。

English

Large language models frequently generate plausible but unfaithful summaries that users cannot verify against source text, a critical limitation in compliance-sensitive domains such as government and legal analysis. We present sui-1, a 24B parameter model that produces abstractive summaries with inline citations, enabling users to trace each claim to its source sentence. Our synthetic data pipeline combines chain-of-thought prompting with multi-stage verification, generating over 22,000 high-quality training examples across five languages from diverse sources including parliamentary documents, web text, and Wikipedia. Evaluation shows sui-1 significantly outperforms all tested open-weight baselines, including models with 3x more parameters. These results demonstrate that task-specific training substantially outperforms scale alone for citation-grounded summarization. Model weights and an interactive demo are publicly available.

Sui-1：基于事实核查的长文本摘要生成系统

sui-1: Grounded and Verifiable Long-Form Summarization

摘要

Support