CGM-JEPA: 예측적 자기지도 사전학습을 통한 일관된 연속혈당측정기 표현 학습

초록

Continuous Glucose Monitoring (CGM)은 대사 하위표현형(인슐린 저항성, IR; 베타세포 기능 장애)을 조기에 검출할 수 있지만, 모집단 규모의 배치에는 두 가지 연계된 문제가 존재한다. 첫째, 동일한 생리학적 상태가 여러 시점(CGM 시계열, 정맥 OGTT, Glucodensity 요약)을 통해 나타나므로, 단일 시점 표현은 배치 시 모달리티나 환경이 변화할 때 전이가 불가능하다. 둘째, 기준 모델들은 이러한 변화에 걸쳐 일관되지 않은 성능을 보인다. 두 문제 모두 하나의 해결책을 지시한다: 특정 단일 시점에서 추상화하여 더 높은 수준의 시간적 및 분포적 구조를 포착하는 표현이 필요하다. 우리는 CGM-JEPA를 제안하는데, 이는 원시 값을 예측하는 대신 마스킹된 잠재 표현을 예측하는 자기지도 사전학습 프레임워크로, 모달리티 간 전이가 가능한 추상화를 제공한다. X-CGM-JEPA는 보완적인 분포 정보를 위해 마스킹된 Glucodensity 교차 시점 목적함수를 추가한다. 우리는 228명의 피험자로부터 얻은 sim389k개의 레이블 없는 CGM 판독값으로 사전학습하고, 20회 반복 × 2-겹 교차 검증 하에 세 가지 체계(코호트 일반화, 정맥혈-대-CGM 전이, 가정용 CGM)에 걸쳐 두 개의 임상 코호트(N=27 및 N=17 공개 릴리스 하위 집합)에서 평가한다. X-CGM-JEPA는 세 가지 체계 모두에서 두 종점에 대해 AUROC 기준 1위 또는 2위를 차지하며, 어떤 기준 모델도 이러한 성능을 달성하지 못하고, 가장 강력한 기준 모델을 코호트 일반화에서 최대 +6.5%포인트, 정맥혈-대-CGM 전이에서 +3.6%포인트 초과한다(쌍체 윌콕슨 검정, p<0.001). 모달리티 전환 하에서는 평균 AUROC를 유지하면서 약한 하위 그룹으로 재분배를 이끌어내며(인종 AUROC 격차 25-54% 감소), 희소한 도메인 내 정맥혈 데이터에서는 분포적 시점이 레이블 인식 군집화를 향상시킨다(ARI +39%, NMI +40%). 코드 및 가중치: https://github.com/cruiseresearchgroup/CGM-JEPA

English

Continuous Glucose Monitoring (CGM) can detect early metabolic subphenotypes (insulin resistance, IR; β-cell dysfunction), but population-scale deployment faces two coupled problems. First, the same physiological state appears through multiple views (CGM time series, venous OGTT, Glucodensity summaries), so single-view representations fail to transfer when deployment shifts the modality or setting. Second, baselines perform inconsistently across these shifts. Both problems point to one remedy: representations that abstract away from any single view to capture higher-level temporal and distributional structure. We propose CGM-JEPA, a self-supervised pretraining framework which predicts masked latent representations rather than raw values, yielding abstraction that transfers across modalities. X-CGM-JEPA adds a masked Glucodensity cross-view objective for complementary distributional information. We pretrain on sim389k unlabeled CGM readings from 228 subjects and evaluate on two clinical cohorts (N=27 and N=17 public-release subsets) across three regimes (cohort generalization, venous-to-CGM transfer, home CGM) under 20-iteration times 2-fold cross-validation. X-CGM-JEPA ranks first or second on AUROC for both endpoints across all three regimes while no baseline does, exceeding the strongest baseline by up to +6.5 pp in cohort generalization and +3.6 pp in venous-to-CGM transfer (paired Wilcoxon, p<0.001). Under modality shift, it matches mean AUROC while redistributing toward weaker subgroups (ethnicity AUROC gap shrinks 25-54%); on sparse in-domain venous data, the distributional view lifts label-aware clustering (ARI +39%, NMI +40%). Code and weights: https://github.com/cruiseresearchgroup/CGM-JEPA