CGM-JEPA: Het leren van consistente representaties van continue glucosemonitoren via voorspellende zelf-gecontroleerde pre-training

Samenvatting

Continue glucosemonitoring (CGM) kan vroege metabole subfenotypen detecteren (insulineresistentie, IR; β-celdisfunctie), maar grootschalige implementatie in de populatie stuit op twee gekoppelde problemen. Ten eerste verschijnt dezelfde fysiologische toestand via meerdere aanzichten (CGM-tijdreeksen, veneuze OGTT, Glucodensity-samenvattingen), waardoor representaties met één aanzicht niet kunnen overdragen wanneer de implementatie van modaliteit of setting verandert. Ten tweede presteren basislijnen inconsistent over deze verschuivingen heen. Beide problemen wijzen op één remedie: representaties die abstraheren van elk afzonderlijk aanzicht om temporele en distributionele structuur op hoger niveau vast te leggen. Wij stellen CGM-JEPA voor, een zelfgecontroleerd pretrainingraamwerk dat gemaskeerde latente representaties voorspelt in plaats van ruwe waarden, wat leidt tot abstractie die overdraagt over modaliteiten heen. X-CGM-JEPA voegt een gemaskeerd Glucodensity-kruisaanzichtdoel toe voor complementaire distributionele informatie. Wij pretrainen op 389k niet-gelabelde CGM-metingen van 228 proefpersonen en evalueren op twee klinische cohorten (N=27 en N=17 openbaar gemaakte subsets) in drie regimes (cohortgeneralisatie, veneus-naar-CGM-overdracht, thuis-CGM) onder 20 iteraties maal 2-voudige kruisvalidatie. X-CGM-JEPA staat eerste of tweede op AUROC voor beide eindpunten in alle drie regimes, terwijl geen enkele basislijn dat doet, en overschrijdt de sterkste basislijn met tot +6,5 procentpunt in cohortgeneralisatie en +3,6 procentpunt in veneus-naar-CGM-overdracht (gepaarde Wilcoxon, p<0,001). Onder modaliteitsverschuiving evenaart het de gemiddelde AUROC terwijl het herverdeelt naar zwakkere subgroepen (etniciteit AUROC-kloof krimpt 25–54%); op schaarse in-domein veneuze data verhoogt het distributionele aanzicht labelbewuste clustering (ARI +39%, NMI +40%). Code en gewichten: https://github.com/cruiseresearchgroup/CGM-JEPA

English

Continuous Glucose Monitoring (CGM) can detect early metabolic subphenotypes (insulin resistance, IR; β-cell dysfunction), but population-scale deployment faces two coupled problems. First, the same physiological state appears through multiple views (CGM time series, venous OGTT, Glucodensity summaries), so single-view representations fail to transfer when deployment shifts the modality or setting. Second, baselines perform inconsistently across these shifts. Both problems point to one remedy: representations that abstract away from any single view to capture higher-level temporal and distributional structure. We propose CGM-JEPA, a self-supervised pretraining framework which predicts masked latent representations rather than raw values, yielding abstraction that transfers across modalities. X-CGM-JEPA adds a masked Glucodensity cross-view objective for complementary distributional information. We pretrain on sim389k unlabeled CGM readings from 228 subjects and evaluate on two clinical cohorts (N=27 and N=17 public-release subsets) across three regimes (cohort generalization, venous-to-CGM transfer, home CGM) under 20-iteration times 2-fold cross-validation. X-CGM-JEPA ranks first or second on AUROC for both endpoints across all three regimes while no baseline does, exceeding the strongest baseline by up to +6.5 pp in cohort generalization and +3.6 pp in venous-to-CGM transfer (paired Wilcoxon, p<0.001). Under modality shift, it matches mean AUROC while redistributing toward weaker subgroups (ethnicity AUROC gap shrinks 25-54%); on sparse in-domain venous data, the distributional view lifts label-aware clustering (ARI +39%, NMI +40%). Code and weights: https://github.com/cruiseresearchgroup/CGM-JEPA