SHARE：社会科学与人文人工智能研究教育平台

摘要

本中期技术报告介绍了SHARE系列基础模型及MIRROR用户界面。SHARE模型是首个由社会科学与人文领域（SSH）专为SSH领域完全预训练的因果语言模型。根据我们自主研发的SSH完形填空基准测试表明，该模型在处理SSH文本时的性能表现已接近使用百倍训练数据量的通用模型（Phi-4）。MIRROR用户界面专为审阅SSH学科文本输入而设计，同时保持批判性参与度。通过构建不生成任何文本的生成式AI界面原型，我们提出了一种既能发挥SHARE模型能力，又不会损害SSH原则与规范完整性的创新方案。

English

This intermediate technical report introduces the SHARE family of base models and the MIRROR user interface. The SHARE models are the first causal language models fully pretrained by and for the social sciences and humanities (SSH). Their performance in modelling SSH texts is close to that of general purpose models (Phi-4) which use 100 times more tokens, as shown by our custom SSH Cloze benchmark. The MIRROR user interface is designed for reviewing text inputs from the SSH disciplines while preserving critical engagement. By prototyping a generative AI interface that does not generate any text, we propose a way to harness the capabilities of the SHARE models without compromising the integrity of SSH principles and norms.

SHARE：社会科学与人文人工智能研究教育平台

SHARE: Social-Humanities AI for Research and Education

摘要

Support