ChatPaper.aiChatPaper

潛在分區網絡:生成建模、表徵學習與分類的統一原則

Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification

September 19, 2025
作者: Zinan Lin, Enshu Liu, Xuefei Ning, Junyi Zhu, Wenyu Wang, Sergey Yekhanin
cs.AI

摘要

生成建模、表徵學習與分類是機器學習(ML)中的三大核心問題,然而其尖端(SoTA)解決方案至今仍大多各自獨立。本文探討:是否存在一個統一原則能夠同時應對這三者?此類統一有望簡化ML流程,並促進跨任務間更強的協同效應。我們提出潛在分區網絡(Latent Zoning Network, LZN)作為邁向此目標的一步。LZN的核心在於構建一個共享的高斯潛在空間,該空間編碼了所有任務的信息。每種數據類型(如圖像、文本、標籤)配備一個將樣本映射至不相交潛在分區的編碼器,以及一個將潛在變量映射回數據的解碼器。ML任務被表達為這些編碼器與解碼器的組合:例如,標籤條件下的圖像生成使用標籤編碼器與圖像解碼器;圖像嵌入使用圖像編碼器;分類則使用圖像編碼器與標籤解碼器。我們在三個逐步複雜的場景中展示了LZN的潛力:(1)LZN能夠增強現有模型(圖像生成):與SoTA的Rectified Flow模型結合時,LZN在CIFAR10上將FID從2.76提升至2.59,且無需修改訓練目標。(2)LZN能獨立解決任務(表徵學習):LZN無需輔助損失函數即可實現無監督表徵學習,在ImageNet的下游線性分類任務中,分別超越開創性的MoCo與SimCLR方法9.3%與0.2%。(3)LZN能同時解決多個任務(聯合生成與分類):通過圖像與標籤的編碼器/解碼器,LZN設計上即能同時執行這兩項任務,在CIFAR10上不僅改善了FID,還達到了SoTA的分類準確率。代碼與訓練模型可於https://github.com/microsoft/latent-zoning-networks獲取。項目網站位於https://zinanlin.me/blogs/latent_zoning_networks.html。
English
Generative modeling, representation learning, and classification are three core problems in machine learning (ML), yet their state-of-the-art (SoTA) solutions remain largely disjoint. In this paper, we ask: Can a unified principle address all three? Such unification could simplify ML pipelines and foster greater synergy across tasks. We introduce Latent Zoning Network (LZN) as a step toward this goal. At its core, LZN creates a shared Gaussian latent space that encodes information across all tasks. Each data type (e.g., images, text, labels) is equipped with an encoder that maps samples to disjoint latent zones, and a decoder that maps latents back to data. ML tasks are expressed as compositions of these encoders and decoders: for example, label-conditional image generation uses a label encoder and image decoder; image embedding uses an image encoder; classification uses an image encoder and label decoder. We demonstrate the promise of LZN in three increasingly complex scenarios: (1) LZN can enhance existing models (image generation): When combined with the SoTA Rectified Flow model, LZN improves FID on CIFAR10 from 2.76 to 2.59-without modifying the training objective. (2) LZN can solve tasks independently (representation learning): LZN can implement unsupervised representation learning without auxiliary loss functions, outperforming the seminal MoCo and SimCLR methods by 9.3% and 0.2%, respectively, on downstream linear classification on ImageNet. (3) LZN can solve multiple tasks simultaneously (joint generation and classification): With image and label encoders/decoders, LZN performs both tasks jointly by design, improving FID and achieving SoTA classification accuracy on CIFAR10. The code and trained models are available at https://github.com/microsoft/latent-zoning-networks. The project website is at https://zinanlin.me/blogs/latent_zoning_networks.html.
PDF445September 22, 2025