SHARE：面向研究与教育的社会科学及人文人工智能

摘要

本技术中期报告介绍了SHARE系列基础模型及MIRROR用户界面。SHARE模型是首个由社会科学与人文领域（SSH）自主完成全预训练的因果语言模型。根据我们自主研发的SSH完形填空基准测试表明，该模型在SSH文本建模任务中的表现已接近参数量百倍于其的通用模型（Phi-4）。MIRROR界面专为SSH学科文本审阅而设计，在保持批判性参与的同时，通过构建不生成任何文本的生成式AI界面原型，我们探索出既能发挥SHARE模型效能又不违背SSH原则与规范的应用路径。

English

This intermediate technical report introduces the SHARE family of base models and the MIRROR user interface. The SHARE models are the first causal language models fully pretrained by and for the social sciences and humanities (SSH). Their performance in modelling SSH texts is close to that of general purpose models (Phi-4) which use 100 times more tokens, as shown by our custom SSH Cloze benchmark. The MIRROR user interface is designed for reviewing text inputs from the SSH disciplines while preserving critical engagement. By prototyping a generative AI interface that does not generate any text, we propose a way to harness the capabilities of the SHARE models without compromising the integrity of SSH principles and norms.