潜在分区网络:生成建模、表征学习与分类的统一框架
Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification
September 19, 2025
作者: Zinan Lin, Enshu Liu, Xuefei Ning, Junyi Zhu, Wenyu Wang, Sergey Yekhanin
cs.AI
摘要
生成建模、表示学习和分类是机器学习(ML)中的三大核心问题,然而其最先进(SoTA)的解决方案在很大程度上仍相互独立。本文探讨:是否存在一个统一的原则能够同时解决这三个问题?这种统一有望简化ML流程,并促进任务间更紧密的协同。为此,我们引入了潜在分区网络(Latent Zoning Network, LZN)作为迈向这一目标的一步。LZN的核心在于构建一个共享的高斯潜在空间,该空间编码了所有任务的信息。每种数据类型(如图像、文本、标签)均配备一个编码器,将样本映射到独立的潜在分区,以及一个解码器,将潜在表示映射回数据。ML任务被表达为这些编码器和解码器的组合:例如,标签条件图像生成使用标签编码器和图像解码器;图像嵌入使用图像编码器;分类则使用图像编码器和标签解码器。我们通过三个逐步复杂的场景展示了LZN的潜力:(1)LZN能够增强现有模型(图像生成):与SoTA的Rectified Flow模型结合时,LZN在不改变训练目标的情况下,将CIFAR10上的FID从2.76提升至2.59。(2)LZN能够独立完成任务(表示学习):LZN无需辅助损失函数即可实现无监督表示学习,在ImageNet下游线性分类任务上,分别超越开创性的MoCo和SimCLR方法9.3%和0.2%。(3)LZN能够同时解决多个任务(联合生成与分类):通过图像和标签的编码器/解码器,LZN设计上即可同时执行这两项任务,不仅改善了FID,还在CIFAR10上达到了SoTA的分类准确率。代码及训练模型可在https://github.com/microsoft/latent-zoning-networks获取,项目网站位于https://zinanlin.me/blogs/latent_zoning_networks.html。
English
Generative modeling, representation learning, and classification are three
core problems in machine learning (ML), yet their state-of-the-art (SoTA)
solutions remain largely disjoint. In this paper, we ask: Can a unified
principle address all three? Such unification could simplify ML pipelines and
foster greater synergy across tasks. We introduce Latent Zoning Network (LZN)
as a step toward this goal. At its core, LZN creates a shared Gaussian latent
space that encodes information across all tasks. Each data type (e.g., images,
text, labels) is equipped with an encoder that maps samples to disjoint latent
zones, and a decoder that maps latents back to data. ML tasks are expressed as
compositions of these encoders and decoders: for example, label-conditional
image generation uses a label encoder and image decoder; image embedding uses
an image encoder; classification uses an image encoder and label decoder. We
demonstrate the promise of LZN in three increasingly complex scenarios: (1) LZN
can enhance existing models (image generation): When combined with the SoTA
Rectified Flow model, LZN improves FID on CIFAR10 from 2.76 to 2.59-without
modifying the training objective. (2) LZN can solve tasks independently
(representation learning): LZN can implement unsupervised representation
learning without auxiliary loss functions, outperforming the seminal MoCo and
SimCLR methods by 9.3% and 0.2%, respectively, on downstream linear
classification on ImageNet. (3) LZN can solve multiple tasks simultaneously
(joint generation and classification): With image and label encoders/decoders,
LZN performs both tasks jointly by design, improving FID and achieving SoTA
classification accuracy on CIFAR10. The code and trained models are available
at https://github.com/microsoft/latent-zoning-networks. The project website is
at https://zinanlin.me/blogs/latent_zoning_networks.html.