潜在分区网络：生成建模、表征学习与分类的统一框架

摘要

生成建模、表示学习和分类是机器学习（ML）中的三大核心问题，然而其最先进（SoTA）的解决方案在很大程度上仍相互独立。本文探讨：是否存在一个统一的原则能够同时解决这三个问题？这种统一有望简化ML流程，并促进任务间更紧密的协同。为此，我们引入了潜在分区网络（Latent Zoning Network, LZN）作为迈向这一目标的一步。LZN的核心在于构建一个共享的高斯潜在空间，该空间编码了所有任务的信息。每种数据类型（如图像、文本、标签）均配备一个编码器，将样本映射到独立的潜在分区，以及一个解码器，将潜在表示映射回数据。ML任务被表达为这些编码器和解码器的组合：例如，标签条件图像生成使用标签编码器和图像解码器；图像嵌入使用图像编码器；分类则使用图像编码器和标签解码器。我们通过三个逐步复杂的场景展示了LZN的潜力：（1）LZN能够增强现有模型（图像生成）：与SoTA的Rectified Flow模型结合时，LZN在不改变训练目标的情况下，将CIFAR10上的FID从2.76提升至2.59。（2）LZN能够独立完成任务（表示学习）：LZN无需辅助损失函数即可实现无监督表示学习，在ImageNet下游线性分类任务上，分别超越开创性的MoCo和SimCLR方法9.3%和0.2%。（3）LZN能够同时解决多个任务（联合生成与分类）：通过图像和标签的编码器/解码器，LZN设计上即可同时执行这两项任务，不仅改善了FID，还在CIFAR10上达到了SoTA的分类准确率。代码及训练模型可在https://github.com/microsoft/latent-zoning-networks获取，项目网站位于https://zinanlin.me/blogs/latent_zoning_networks.html。

English

Generative modeling, representation learning, and classification are three core problems in machine learning (ML), yet their state-of-the-art (SoTA) solutions remain largely disjoint. In this paper, we ask: Can a unified principle address all three? Such unification could simplify ML pipelines and foster greater synergy across tasks. We introduce Latent Zoning Network (LZN) as a step toward this goal. At its core, LZN creates a shared Gaussian latent space that encodes information across all tasks. Each data type (e.g., images, text, labels) is equipped with an encoder that maps samples to disjoint latent zones, and a decoder that maps latents back to data. ML tasks are expressed as compositions of these encoders and decoders: for example, label-conditional image generation uses a label encoder and image decoder; image embedding uses an image encoder; classification uses an image encoder and label decoder. We demonstrate the promise of LZN in three increasingly complex scenarios: (1) LZN can enhance existing models (image generation): When combined with the SoTA Rectified Flow model, LZN improves FID on CIFAR10 from 2.76 to 2.59-without modifying the training objective. (2) LZN can solve tasks independently (representation learning): LZN can implement unsupervised representation learning without auxiliary loss functions, outperforming the seminal MoCo and SimCLR methods by 9.3% and 0.2%, respectively, on downstream linear classification on ImageNet. (3) LZN can solve multiple tasks simultaneously (joint generation and classification): With image and label encoders/decoders, LZN performs both tasks jointly by design, improving FID and achieving SoTA classification accuracy on CIFAR10. The code and trained models are available at https://github.com/microsoft/latent-zoning-networks. The project website is at https://zinanlin.me/blogs/latent_zoning_networks.html.

潜在分区网络：生成建模、表征学习与分类的统一框架

Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification

摘要

Support