潛在分區網絡:生成建模、表徵學習與分類的統一原則
Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification
September 19, 2025
作者: Zinan Lin, Enshu Liu, Xuefei Ning, Junyi Zhu, Wenyu Wang, Sergey Yekhanin
cs.AI
摘要
生成建模、表徵學習與分類是機器學習(ML)中的三大核心問題,然而其尖端(SoTA)解決方案至今仍大多各自獨立。本文探討:是否存在一個統一原則能夠同時應對這三者?此類統一有望簡化ML流程,並促進跨任務間更強的協同效應。我們提出潛在分區網絡(Latent Zoning Network, LZN)作為邁向此目標的一步。LZN的核心在於構建一個共享的高斯潛在空間,該空間編碼了所有任務的信息。每種數據類型(如圖像、文本、標籤)配備一個將樣本映射至不相交潛在分區的編碼器,以及一個將潛在變量映射回數據的解碼器。ML任務被表達為這些編碼器與解碼器的組合:例如,標籤條件下的圖像生成使用標籤編碼器與圖像解碼器;圖像嵌入使用圖像編碼器;分類則使用圖像編碼器與標籤解碼器。我們在三個逐步複雜的場景中展示了LZN的潛力:(1)LZN能夠增強現有模型(圖像生成):與SoTA的Rectified Flow模型結合時,LZN在CIFAR10上將FID從2.76提升至2.59,且無需修改訓練目標。(2)LZN能獨立解決任務(表徵學習):LZN無需輔助損失函數即可實現無監督表徵學習,在ImageNet的下游線性分類任務中,分別超越開創性的MoCo與SimCLR方法9.3%與0.2%。(3)LZN能同時解決多個任務(聯合生成與分類):通過圖像與標籤的編碼器/解碼器,LZN設計上即能同時執行這兩項任務,在CIFAR10上不僅改善了FID,還達到了SoTA的分類準確率。代碼與訓練模型可於https://github.com/microsoft/latent-zoning-networks獲取。項目網站位於https://zinanlin.me/blogs/latent_zoning_networks.html。
English
Generative modeling, representation learning, and classification are three
core problems in machine learning (ML), yet their state-of-the-art (SoTA)
solutions remain largely disjoint. In this paper, we ask: Can a unified
principle address all three? Such unification could simplify ML pipelines and
foster greater synergy across tasks. We introduce Latent Zoning Network (LZN)
as a step toward this goal. At its core, LZN creates a shared Gaussian latent
space that encodes information across all tasks. Each data type (e.g., images,
text, labels) is equipped with an encoder that maps samples to disjoint latent
zones, and a decoder that maps latents back to data. ML tasks are expressed as
compositions of these encoders and decoders: for example, label-conditional
image generation uses a label encoder and image decoder; image embedding uses
an image encoder; classification uses an image encoder and label decoder. We
demonstrate the promise of LZN in three increasingly complex scenarios: (1) LZN
can enhance existing models (image generation): When combined with the SoTA
Rectified Flow model, LZN improves FID on CIFAR10 from 2.76 to 2.59-without
modifying the training objective. (2) LZN can solve tasks independently
(representation learning): LZN can implement unsupervised representation
learning without auxiliary loss functions, outperforming the seminal MoCo and
SimCLR methods by 9.3% and 0.2%, respectively, on downstream linear
classification on ImageNet. (3) LZN can solve multiple tasks simultaneously
(joint generation and classification): With image and label encoders/decoders,
LZN performs both tasks jointly by design, improving FID and achieving SoTA
classification accuracy on CIFAR10. The code and trained models are available
at https://github.com/microsoft/latent-zoning-networks. The project website is
at https://zinanlin.me/blogs/latent_zoning_networks.html.