潛在分區網絡：生成建模、表徵學習與分類的統一原則

摘要

生成建模、表徵學習與分類是機器學習（ML）中的三大核心問題，然而其尖端（SoTA）解決方案至今仍大多各自獨立。本文探討：是否存在一個統一原則能夠同時應對這三者？此類統一有望簡化ML流程，並促進跨任務間更強的協同效應。我們提出潛在分區網絡（Latent Zoning Network, LZN）作為邁向此目標的一步。LZN的核心在於構建一個共享的高斯潛在空間，該空間編碼了所有任務的信息。每種數據類型（如圖像、文本、標籤）配備一個將樣本映射至不相交潛在分區的編碼器，以及一個將潛在變量映射回數據的解碼器。ML任務被表達為這些編碼器與解碼器的組合：例如，標籤條件下的圖像生成使用標籤編碼器與圖像解碼器；圖像嵌入使用圖像編碼器；分類則使用圖像編碼器與標籤解碼器。我們在三個逐步複雜的場景中展示了LZN的潛力：（1）LZN能夠增強現有模型（圖像生成）：與SoTA的Rectified Flow模型結合時，LZN在CIFAR10上將FID從2.76提升至2.59，且無需修改訓練目標。（2）LZN能獨立解決任務（表徵學習）：LZN無需輔助損失函數即可實現無監督表徵學習，在ImageNet的下游線性分類任務中，分別超越開創性的MoCo與SimCLR方法9.3%與0.2%。（3）LZN能同時解決多個任務（聯合生成與分類）：通過圖像與標籤的編碼器/解碼器，LZN設計上即能同時執行這兩項任務，在CIFAR10上不僅改善了FID，還達到了SoTA的分類準確率。代碼與訓練模型可於https://github.com/microsoft/latent-zoning-networks獲取。項目網站位於https://zinanlin.me/blogs/latent_zoning_networks.html。

English

Generative modeling, representation learning, and classification are three core problems in machine learning (ML), yet their state-of-the-art (SoTA) solutions remain largely disjoint. In this paper, we ask: Can a unified principle address all three? Such unification could simplify ML pipelines and foster greater synergy across tasks. We introduce Latent Zoning Network (LZN) as a step toward this goal. At its core, LZN creates a shared Gaussian latent space that encodes information across all tasks. Each data type (e.g., images, text, labels) is equipped with an encoder that maps samples to disjoint latent zones, and a decoder that maps latents back to data. ML tasks are expressed as compositions of these encoders and decoders: for example, label-conditional image generation uses a label encoder and image decoder; image embedding uses an image encoder; classification uses an image encoder and label decoder. We demonstrate the promise of LZN in three increasingly complex scenarios: (1) LZN can enhance existing models (image generation): When combined with the SoTA Rectified Flow model, LZN improves FID on CIFAR10 from 2.76 to 2.59-without modifying the training objective. (2) LZN can solve tasks independently (representation learning): LZN can implement unsupervised representation learning without auxiliary loss functions, outperforming the seminal MoCo and SimCLR methods by 9.3% and 0.2%, respectively, on downstream linear classification on ImageNet. (3) LZN can solve multiple tasks simultaneously (joint generation and classification): With image and label encoders/decoders, LZN performs both tasks jointly by design, improving FID and achieving SoTA classification accuracy on CIFAR10. The code and trained models are available at https://github.com/microsoft/latent-zoning-networks. The project website is at https://zinanlin.me/blogs/latent_zoning_networks.html.

潛在分區網絡：生成建模、表徵學習與分類的統一原則

Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification

摘要

Support