ChatPaper.aiChatPaper

保護隱私、提升可及性與降低成本:一種用於醫療轉錄與筆記生成的設備端人工智慧模型

Preserving Privacy, Increasing Accessibility, and Reducing Cost: An On-Device Artificial Intelligence Model for Medical Transcription and Note Generation

July 3, 2025
作者: Johnson Thomas, Ayush Mudgal, Wendao Liu, Nisten Tahiraj, Zeeshaan Mohammed, Dhruv Diddi
cs.AI

摘要

背景:臨床文件記錄對醫療服務提供者而言是一項重大負擔,醫師每天需花費高達2小時於行政任務上。近期大型語言模型(LLMs)的進展提供了有前景的解決方案,但隱私顧慮與計算需求限制了其在醫療環境中的應用。目的:開發並評估一種基於精調Llama 3.2 1B模型的隱私保護、設備端醫療轉錄系統,該系統能從醫療轉錄中生成結構化病歷,同時確保數據完全在瀏覽器內自主控制。方法:我們利用參數高效精調(PEFT)結合LoRA技術,在1,500對合成醫療轉錄至結構化病歷的數據集上對Llama 3.2 1B模型進行了精調。該模型與基礎Llama 3.2 1B模型在兩個數據集上進行了對比評估:100份內分泌學轉錄稿及140例修改後的ACI基準案例。評估採用了統計指標(ROUGE、BERTScore、BLEURT)及LLM作為評判者的多維臨床質量評估。結果:精調後的OnDevice模型相較於基礎模型展現出顯著提升。在ACI基準測試中,ROUGE-1分數從0.346提升至0.496,而BERTScore F1則從0.832提高至0.866。臨床質量評估顯示,主要幻覺案例大幅減少(從85例降至35例),事實準確性顯著增強(5分制下從2.81提升至3.54)。在內部評估數據集上亦觀察到類似改進,綜合評分從3.13上升至4.43(+41.5%)。結論:針對醫療轉錄精調緊湊型LLMs,不僅實現了臨床意義上的顯著改進,還支持完全在設備端瀏覽器部署。此方法有效應對了AI在醫療領域應用的關鍵障礙:隱私保護、成本降低及資源受限環境下的可及性。
English
Background: Clinical documentation represents a significant burden for healthcare providers, with physicians spending up to 2 hours daily on administrative tasks. Recent advances in large language models (LLMs) offer promising solutions, but privacy concerns and computational requirements limit their adoption in healthcare settings. Objective: To develop and evaluate a privacy-preserving, on-device medical transcription system using a fine-tuned Llama 3.2 1B model capable of generating structured medical notes from medical transcriptions while maintaining complete data sovereignty entirely in the browser. Methods: We fine-tuned a Llama 3.2 1B model using Parameter-Efficient Fine-Tuning (PEFT) with LoRA on 1,500 synthetic medical transcription-to-structured note pairs. The model was evaluated against the base Llama 3.2 1B on two datasets: 100 endocrinology transcripts and 140 modified ACI benchmark cases. Evaluation employed both statistical metrics (ROUGE, BERTScore, BLEURT) and LLM-as-judge assessments across multiple clinical quality dimensions. Results: The fine-tuned OnDevice model demonstrated substantial improvements over the base model. On the ACI benchmark, ROUGE-1 scores increased from 0.346 to 0.496, while BERTScore F1 improved from 0.832 to 0.866. Clinical quality assessments showed marked reduction in major hallucinations (from 85 to 35 cases) and enhanced factual correctness (2.81 to 3.54 on 5-point scale). Similar improvements were observed on the internal evaluation dataset, with composite scores increasing from 3.13 to 4.43 (+41.5%). Conclusions: Fine-tuning compact LLMs for medical transcription yields clinically meaningful improvements while enabling complete on-device browser deployment. This approach addresses key barriers to AI adoption in healthcare: privacy preservation, cost reduction, and accessibility for resource-constrained environments.
PDF81July 8, 2025