EHRCon:用於檢查電子健康記錄中非結構化註釋與結構化表格之間一致性的數據集
EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records
June 24, 2024
作者: Yeonsu Kwon, Jiho Kim, Gyubok Lee, Seongsu Bae, Daeun Kyung, Wonchul Cha, Tom Pollard, Alistair Johnson, Edward Choi
cs.AI
摘要
電子健康記錄(EHRs)對於存儲全面的病人醫療記錄至關重要,結合了結構化數據(例如藥物)與詳細的臨床註釋(例如醫生註記)。這些元素對於簡單的數據檢索至關重要,並提供了對病人護理的深入、情境化洞察。然而,由於不直觀的EHR系統設計和人為錯誤,它們往往存在差異,對病人安全構成嚴重風險。為了應對這一問題,我們開發了EHRCon,這是一個新的數據集和任務,專門旨在確保EHR中結構化表格和非結構化註釋之間的數據一致性。EHRCon是通過與醫療專業人員合作使用MIMIC-III EHR數據集精心製作的,包括對105個臨床註釋中的3,943個實體進行手動標註,以確保與數據庫記錄的一致性。EHRCon有兩個版本,一個使用原始的MIMIC-III架構,另一個使用OMOP CDM架構,以增加其應用性和泛化性。此外,利用大型語言模型的能力,我們引入了CheckEHR,這是一個用於驗證臨床註釋和數據庫表格一致性的新框架。CheckEHR利用八階段過程,在少樣本和零樣本設置下展現了有前途的結果。代碼可在https://github.com/dustn1259/EHRCon找到。
English
Electronic Health Records (EHRs) are integral for storing comprehensive
patient medical records, combining structured data (e.g., medications) with
detailed clinical notes (e.g., physician notes). These elements are essential
for straightforward data retrieval and provide deep, contextual insights into
patient care. However, they often suffer from discrepancies due to unintuitive
EHR system designs and human errors, posing serious risks to patient safety. To
address this, we developed EHRCon, a new dataset and task specifically designed
to ensure data consistency between structured tables and unstructured notes in
EHRs. EHRCon was crafted in collaboration with healthcare professionals using
the MIMIC-III EHR dataset, and includes manual annotations of 3,943 entities
across 105 clinical notes checked against database entries for consistency.
EHRCon has two versions, one using the original MIMIC-III schema, and another
using the OMOP CDM schema, in order to increase its applicability and
generalizability. Furthermore, leveraging the capabilities of large language
models, we introduce CheckEHR, a novel framework for verifying the consistency
between clinical notes and database tables. CheckEHR utilizes an eight-stage
process and shows promising results in both few-shot and zero-shot settings.
The code is available at https://github.com/dustn1259/EHRCon.Summary
AI-Generated Summary