Van generalistische naar specialistische representatie

Samenvatting

Gegeven een generalistisch model is het leren van een taakrelevante specialistische representatie fundamenteel voor downstream-toepassingen. Identificeerbaarheid, de asymptotische garantie om de grondwaarheidsrepresentatie te herstellen, is cruciaal omdat deze de ultieme limiet van elk model vaststelt, zelfs met oneindige data en rekenkracht. We bestuderen dit probleem in een volledig niet-parametrische setting, zonder gebruik te maken van interventies, parametrische vormen of structurele beperkingen. We bewijzen eerst dat de structuur tussen tijdstappen en taken volledig ongesuperviseerd identificeerbaar is, zelfs wanneer sequenties strikte temporele afhankelijkheid missen en onderbroken kunnen zijn, en taaktoewijzingen willekeurig complexe en door elkaar lopende structuren kunnen volgen. Vervolgens bewijzen we dat, binnen elke tijdstap, de taakrelevante latente representatie kan worden ontward van het irrelevante deel onder een eenvoudige regularisatie voor schaarste, zonder enige aanvullende informatie of parametrische beperkingen. Samen leggen deze resultaten een hiërarchische basis: taakstructuur is identificeerbaar over tijdstappen heen, en taakrelevante latente representaties zijn identificeerbaar binnen elke stap. Voor zover wij weten biedt elk resultaat een eerste algemene niet-parametrische identificeerbaarheidsgarantie, en samen vormen ze een stap richting het bewijsbaar overgaan van generalistische naar specialistische modellen.

English

Given a generalist model, learning a task-relevant specialist representation is fundamental for downstream applications. Identifiability, the asymptotic guarantee of recovering the ground-truth representation, is critical because it sets the ultimate limit of any model, even with infinite data and computation. We study this problem in a completely nonparametric setting, without relying on interventions, parametric forms, or structural constraints. We first prove that the structure between time steps and tasks is identifiable in a fully unsupervised manner, even when sequences lack strict temporal dependence and may exhibit disconnections, and task assignments can follow arbitrarily complex and interleaving structures. We then prove that, within each time step, the task-relevant latent representation can be disentangled from the irrelevant part under a simple sparsity regularization, without any additional information or parametric constraints. Together, these results establish a hierarchical foundation: task structure is identifiable across time steps, and task-relevant latent representations are identifiable within each step. To our knowledge, each result provides a first general nonparametric identifiability guarantee, and together they mark a step toward provably moving from generalist to specialist models.