Wanneer, Waar en Hoe: Adaptieve Binning voor Zelfgestuurd Leren bij Tabelgegevens

Samenvatting

Medische tabelgegevens komen veel voor in klinisch onderzoek, maar diep leren voor tabellen blijft onderbelicht omdat betrouwbare labels vaak kostbare expertise vereisen, hoewel gestructureerde klinische variabelen routinematig beschikbaar zijn in tabelvorm. Zelfgestuurd leren kan deze ongelabelde tabellen benutten, en recente op binning gebaseerde pretext-taken bieden een veelbelovende inductieve bias, maar bestaande doelstellingen hanteren een vaste globale kwantieldiscretisatie en passen kenmerk-agnostisch toezicht toe. Wij stellen Adaptieve Binning voor, een trainingsadaptieve discretisatie-pretext voor tabel-gebonden SSL die discretisatie koppelt aan leren via een kenmerkgewijze curriculum van grof naar fijn. Gemotiveerd door de spectrale bias van neurale netwerken en de principes van curriculumleren, verfijnt onze methode progressief de discretisatie per kenmerk bij detectie van een plateau en selecteert representatiebewuste splitsingen om zowel de concentratie in de waarderuimte als de coherentie in de representatieruimte te verbeteren. Een heterogeniteitsbewuste doelstelling verenigt categorische reconstructie met ordinaal toezicht voor numerieke kenmerken, en experimenten op openbare medische tabeldatasets onder uniforme evaluatieprotocollen tonen consistente verbeteringen voor lineaire sondering en fijnafstelling zonder datasetspecifieke discretisatie-afstemming. Verder introduceren we een medische tabel-gebonden SSL-benchmark met gestandaardiseerde protocollen om reproduceerbare vooruitgang in dit onderbelichte domein te ondersteunen. Onze code is beschikbaar op https://github.com/labhai/Adaptive-Binning.

English

Medical tabular data are ubiquitous in clinical research, but deep learning for tables remains underexplored because reliable labels often require costly expert adjudication, even though structured clinical variables are routinely available in tabular form. Self-supervised learning can leverage these unlabeled tables, and recent binning-based pretexts offer a promising inductive bias, but existing objectives fix a single global quantile discretization and apply feature-agnostic supervision. We propose Adaptive Binning, a training-adaptive discretization pretext for tabular SSL that couples discretization to learning through a feature-wise coarse-to-fine curriculum. Motivated by the spectral bias of neural networks and the principles of curriculum learning, our method progressively refines discretization per feature upon plateau detection and selects representation-aware splits to jointly improve value-space concentration and representation-space coherence. A heterogeneity-aware objective unifies categorical reconstruction with ordinal supervision for numerical features, and experiments on public medical tabular datasets under unified evaluation protocols show consistent gains for linear probing and fine-tuning without dataset-specific discretization tuning. We further introduce a medical tabular SSL benchmark with standardized protocols to support reproducible progress in this underexplored domain. Our code is available at https://github.com/labhai/Adaptive-Binning.