ChatPaper.aiChatPaper

潜在分区网络:生成建模、表征学习与分类的统一框架

Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification

September 19, 2025
作者: Zinan Lin, Enshu Liu, Xuefei Ning, Junyi Zhu, Wenyu Wang, Sergey Yekhanin
cs.AI

摘要

生成建模、表示学习和分类是机器学习(ML)中的三大核心问题,然而其最先进(SoTA)的解决方案在很大程度上仍相互独立。本文探讨:是否存在一个统一的原则能够同时解决这三个问题?这种统一有望简化ML流程,并促进任务间更紧密的协同。为此,我们引入了潜在分区网络(Latent Zoning Network, LZN)作为迈向这一目标的一步。LZN的核心在于构建一个共享的高斯潜在空间,该空间编码了所有任务的信息。每种数据类型(如图像、文本、标签)均配备一个编码器,将样本映射到独立的潜在分区,以及一个解码器,将潜在表示映射回数据。ML任务被表达为这些编码器和解码器的组合:例如,标签条件图像生成使用标签编码器和图像解码器;图像嵌入使用图像编码器;分类则使用图像编码器和标签解码器。我们通过三个逐步复杂的场景展示了LZN的潜力:(1)LZN能够增强现有模型(图像生成):与SoTA的Rectified Flow模型结合时,LZN在不改变训练目标的情况下,将CIFAR10上的FID从2.76提升至2.59。(2)LZN能够独立完成任务(表示学习):LZN无需辅助损失函数即可实现无监督表示学习,在ImageNet下游线性分类任务上,分别超越开创性的MoCo和SimCLR方法9.3%和0.2%。(3)LZN能够同时解决多个任务(联合生成与分类):通过图像和标签的编码器/解码器,LZN设计上即可同时执行这两项任务,不仅改善了FID,还在CIFAR10上达到了SoTA的分类准确率。代码及训练模型可在https://github.com/microsoft/latent-zoning-networks获取,项目网站位于https://zinanlin.me/blogs/latent_zoning_networks.html。
English
Generative modeling, representation learning, and classification are three core problems in machine learning (ML), yet their state-of-the-art (SoTA) solutions remain largely disjoint. In this paper, we ask: Can a unified principle address all three? Such unification could simplify ML pipelines and foster greater synergy across tasks. We introduce Latent Zoning Network (LZN) as a step toward this goal. At its core, LZN creates a shared Gaussian latent space that encodes information across all tasks. Each data type (e.g., images, text, labels) is equipped with an encoder that maps samples to disjoint latent zones, and a decoder that maps latents back to data. ML tasks are expressed as compositions of these encoders and decoders: for example, label-conditional image generation uses a label encoder and image decoder; image embedding uses an image encoder; classification uses an image encoder and label decoder. We demonstrate the promise of LZN in three increasingly complex scenarios: (1) LZN can enhance existing models (image generation): When combined with the SoTA Rectified Flow model, LZN improves FID on CIFAR10 from 2.76 to 2.59-without modifying the training objective. (2) LZN can solve tasks independently (representation learning): LZN can implement unsupervised representation learning without auxiliary loss functions, outperforming the seminal MoCo and SimCLR methods by 9.3% and 0.2%, respectively, on downstream linear classification on ImageNet. (3) LZN can solve multiple tasks simultaneously (joint generation and classification): With image and label encoders/decoders, LZN performs both tasks jointly by design, improving FID and achieving SoTA classification accuracy on CIFAR10. The code and trained models are available at https://github.com/microsoft/latent-zoning-networks. The project website is at https://zinanlin.me/blogs/latent_zoning_networks.html.
PDF445September 22, 2025