潜在ゾーニングネットワーク：生成モデリング、表現学習、分類のための統一原理

要旨

生成モデリング、表現学習、分類は機械学習（ML）における3つの核心的な問題であるが、それらの最先端（SoTA）の解決策は依然として大きく分離されたままである。本論文では、統一的な原理がこれら3つすべてに対応できるかどうかを問う。そのような統一はMLパイプラインを簡素化し、タスク間のより大きなシナジーを促進する可能性がある。我々は、この目標に向けた一歩としてLatent Zoning Network（LZN）を提案する。LZNの核心は、すべてのタスクにわたる情報をエンコードする共有ガウス潜在空間を作成することである。各データタイプ（例えば、画像、テキスト、ラベル）は、サンプルを互いに素な潜在ゾーンにマッピングするエンコーダと、潜在変数をデータに戻すデコーダを備えている。MLタスクはこれらのエンコーダとデコーダの組み合わせとして表現される。例えば、ラベル条件付き画像生成はラベルエンコーダと画像デコーダを使用し、画像埋め込みは画像エンコーダを使用し、分類は画像エンコーダとラベルデコーダを使用する。我々は、LZNの可能性を3つの段階的に複雑化するシナリオで示す：（1）LZNは既存のモデルを強化できる（画像生成）：SoTAのRectified Flowモデルと組み合わせることで、LZNはCIFAR10のFIDを2.76から2.59に改善する―訓練目的を変更することなく。（2）LZNはタスクを独立して解決できる（表現学習）：LZNは補助損失関数なしで教師なし表現学習を実装でき、ImageNetの下流線形分類において、MoCoおよびSimCLR手法をそれぞれ9.3％および0.2％上回る。（3）LZNは複数のタスクを同時に解決できる（生成と分類の同時実行）：画像およびラベルエンコーダ/デコーダを使用することで、LZNは設計上両タスクを同時に実行し、FIDを改善し、CIFAR10でSoTAの分類精度を達成する。コードと訓練済みモデルはhttps://github.com/microsoft/latent-zoning-networksで公開されている。プロジェクトウェブサイトはhttps://zinanlin.me/blogs/latent_zoning_networks.htmlにある。

English

Generative modeling, representation learning, and classification are three core problems in machine learning (ML), yet their state-of-the-art (SoTA) solutions remain largely disjoint. In this paper, we ask: Can a unified principle address all three? Such unification could simplify ML pipelines and foster greater synergy across tasks. We introduce Latent Zoning Network (LZN) as a step toward this goal. At its core, LZN creates a shared Gaussian latent space that encodes information across all tasks. Each data type (e.g., images, text, labels) is equipped with an encoder that maps samples to disjoint latent zones, and a decoder that maps latents back to data. ML tasks are expressed as compositions of these encoders and decoders: for example, label-conditional image generation uses a label encoder and image decoder; image embedding uses an image encoder; classification uses an image encoder and label decoder. We demonstrate the promise of LZN in three increasingly complex scenarios: (1) LZN can enhance existing models (image generation): When combined with the SoTA Rectified Flow model, LZN improves FID on CIFAR10 from 2.76 to 2.59-without modifying the training objective. (2) LZN can solve tasks independently (representation learning): LZN can implement unsupervised representation learning without auxiliary loss functions, outperforming the seminal MoCo and SimCLR methods by 9.3% and 0.2%, respectively, on downstream linear classification on ImageNet. (3) LZN can solve multiple tasks simultaneously (joint generation and classification): With image and label encoders/decoders, LZN performs both tasks jointly by design, improving FID and achieving SoTA classification accuracy on CIFAR10. The code and trained models are available at https://github.com/microsoft/latent-zoning-networks. The project website is at https://zinanlin.me/blogs/latent_zoning_networks.html.

潜在ゾーニングネットワーク：生成モデリング、表現学習、分類のための統一原理

Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification

要旨

Support