ChatPaper.aiChatPaper

面向低资源语言的隐私保护型临床信息抽取小型语言模型

Small Language Models for Privacy-Preserving Clinical Information Extraction in Low-Resource Languages

February 24, 2026
作者: Mohammadreza Ghaffarzadeh-Esfahani, Nahid Yousefian, Ebrahim Heidari-Farsani, Ali Akbar Omidvarian, Sepehr Ghahraei, Atena Farangi, AmirBahador Boroumand
cs.AI

摘要

在低资源语言中从医疗记录中提取临床信息,仍是医疗自然语言处理(NLP)领域的重大挑战。本研究评估了一种两步流程:首先使用Aya-expanse-8B作为波斯语-英语翻译模型,再结合五种开源小语言模型(SLMs)——Qwen2.5-7B-Instruct、Llama-3.1-8B-Instruct、Llama-3.2-3B-Instruct、Qwen2.5-1.5B-Instruct和Gemma-3-1B-it,对从癌症安宁疗护呼叫中心收集的1,221份匿名波斯语记录进行13项临床特征的二元提取。采用少量样本提示策略且未进行微调的情况下,通过宏平均F1分数、马修斯相关系数(MCC)、敏感度和特异度评估模型表现以应对类别不平衡问题。Qwen2.5-7B-Instruct取得最佳整体性能(中位宏F1值:0.899;MCC:0.797),而Gemma-3-1B-it表现最弱。较大参数量模型(7B-8B)在敏感度和MCC指标上持续优于较小模型。对Aya-expanse-8B的双语分析显示,将波斯语记录翻译为英语可提升敏感度、减少缺失输出,并增强对类别不平衡具有鲁棒性的指标,但代价是特异度和精确度轻微下降。特征层面结果显示大多数模型能可靠提取生理症状,而心理主诉、行政请求和复杂躯体特征仍是挑战。这些发现为在基础设施和标注资源有限的多语言临床NLP环境中部署开源SLMs提供了实用且保护隐私的蓝图,同时凸显了在敏感医疗应用中联合优化模型规模与输入语言策略的重要性。
English
Extracting clinical information from medical transcripts in low-resource languages remains a significant challenge in healthcare natural language processing (NLP). This study evaluates a two-step pipeline combining Aya-expanse-8B as a Persian-to-English translation model with five open-source small language models (SLMs) -- Qwen2.5-7B-Instruct, Llama-3.1-8B-Instruct, Llama-3.2-3B-Instruct, Qwen2.5-1.5B-Instruct, and Gemma-3-1B-it -- for binary extraction of 13 clinical features from 1,221 anonymized Persian transcripts collected at a cancer palliative care call center. Using a few-shot prompting strategy without fine-tuning, models were assessed on macro-averaged F1-score, Matthews Correlation Coefficient (MCC), sensitivity, and specificity to account for class imbalance. Qwen2.5-7B-Instruct achieved the highest overall performance (median macro-F1: 0.899; MCC: 0.797), while Gemma-3-1B-it showed the weakest results. Larger models (7B--8B parameters) consistently outperformed smaller counterparts in sensitivity and MCC. A bilingual analysis of Aya-expanse-8B revealed that translating Persian transcripts to English improved sensitivity, reduced missing outputs, and boosted metrics robust to class imbalance, though at the cost of slightly lower specificity and precision. Feature-level results showed reliable extraction of physiological symptoms across most models, whereas psychological complaints, administrative requests, and complex somatic features remained challenging. These findings establish a practical, privacy-preserving blueprint for deploying open-source SLMs in multilingual clinical NLP settings with limited infrastructure and annotation resources, and highlight the importance of jointly optimizing model scale and input language strategy for sensitive healthcare applications.
PDF12February 27, 2026