ChatPaper.aiChatPaper

保护隐私、提升可及性并降低成本:一款用于医疗转录与笔记生成的设备端人工智能模型

Preserving Privacy, Increasing Accessibility, and Reducing Cost: An On-Device Artificial Intelligence Model for Medical Transcription and Note Generation

July 3, 2025
作者: Johnson Thomas, Ayush Mudgal, Wendao Liu, Nisten Tahiraj, Zeeshaan Mohammed, Dhruv Diddi
cs.AI

摘要

背景:临床文档记录对医疗保健提供者构成了重大负担,医生每天需花费多达2小时处理行政事务。近期大型语言模型(LLMs)的进展提供了有前景的解决方案,但隐私顾虑与计算需求限制了其在医疗环境中的应用。目的:开发并评估一款基于微调Llama 3.2 1B模型的隐私保护型、设备端医疗转录系统,该系统能够从医疗转录中生成结构化病历,同时确保数据完全在浏览器内自主控制。方法:我们采用参数高效微调(PEFT)结合LoRA技术,在1,500对合成的医疗转录至结构化病历数据上微调了Llama 3.2 1B模型。该模型与基础Llama 3.2 1B模型在两个数据集上进行了对比评估:100份内分泌科转录记录和140例改良的ACI基准案例。评估采用了统计指标(ROUGE、BERTScore、BLEURT)及LLM作为评判者的多维度临床质量评估。结果:微调后的OnDevice模型相较于基础模型展现出显著提升。在ACI基准测试中,ROUGE-1得分从0.346提升至0.496,BERTScore F1从0.832提高至0.866。临床质量评估显示,主要幻觉案例大幅减少(从85例降至35例),事实准确性增强(5分制下从2.81升至3.54)。内部评估数据集上亦观察到类似改进,综合评分从3.13增至4.43(+41.5%)。结论:针对医疗转录微调紧凑型LLMs,不仅实现了临床意义的提升,还支持完全在设备端浏览器部署。此方法有效应对了AI在医疗领域应用的关键障碍:隐私保护、成本降低及资源受限环境下的可访问性。
English
Background: Clinical documentation represents a significant burden for healthcare providers, with physicians spending up to 2 hours daily on administrative tasks. Recent advances in large language models (LLMs) offer promising solutions, but privacy concerns and computational requirements limit their adoption in healthcare settings. Objective: To develop and evaluate a privacy-preserving, on-device medical transcription system using a fine-tuned Llama 3.2 1B model capable of generating structured medical notes from medical transcriptions while maintaining complete data sovereignty entirely in the browser. Methods: We fine-tuned a Llama 3.2 1B model using Parameter-Efficient Fine-Tuning (PEFT) with LoRA on 1,500 synthetic medical transcription-to-structured note pairs. The model was evaluated against the base Llama 3.2 1B on two datasets: 100 endocrinology transcripts and 140 modified ACI benchmark cases. Evaluation employed both statistical metrics (ROUGE, BERTScore, BLEURT) and LLM-as-judge assessments across multiple clinical quality dimensions. Results: The fine-tuned OnDevice model demonstrated substantial improvements over the base model. On the ACI benchmark, ROUGE-1 scores increased from 0.346 to 0.496, while BERTScore F1 improved from 0.832 to 0.866. Clinical quality assessments showed marked reduction in major hallucinations (from 85 to 35 cases) and enhanced factual correctness (2.81 to 3.54 on 5-point scale). Similar improvements were observed on the internal evaluation dataset, with composite scores increasing from 3.13 to 4.43 (+41.5%). Conclusions: Fine-tuning compact LLMs for medical transcription yields clinically meaningful improvements while enabling complete on-device browser deployment. This approach addresses key barriers to AI adoption in healthcare: privacy preservation, cost reduction, and accessibility for resource-constrained environments.
PDF81July 8, 2025