圧縮学習のための学習された圧縮

要旨

現代のセンサーは、ますます豊富な高解像度データのストリームを生成します。リソースの制約により、機械学習システムはこの情報の大部分を解像度を低下させることで破棄します。圧縮領域学習により、モデルはコンパクトな潜在表現で動作でき、同じ予算でより高い効果的な解像度を実現できます。ただし、既存の圧縮システムは圧縮学習には理想的ではありません。線形変換符号化やエンドツーエンド学習圧縮システムはビットレートを低減しますが、次元を一様に低減させないため、効率を意味する形で増加させません。生成オートエンコーダは次元を低減しますが、その敵対的または知覚的な目的は重要な情報の損失につながります。これらの制限に対処するために、私たちはWaLLoC（Wavelet Learned Lossy Compression）を導入します。これは、線形変換符号化と非線形次元削減オートエンコーダを組み合わせたニューラルコーデックアーキテクチャです。WaLLoCは、浅い非対称オートエンコーダとエントロピーのボトルネックを反転可能なウェーブレットパケット変換の間に挟み込みます。いくつかの主要な指標において、WaLLoCは最先端の潜在拡散モデルで使用されているオートエンコーダを凌駕します。WaLLoCは高周波詳細を表現するために知覚的または敵対的な損失を必要とせず、RGB画像やステレオオーディオ以外のモダリティとの互換性を提供します。WaLLoCのエンコーダはほとんどが線形演算で構成されており、非常に効率的でモバイルコンピューティング、リモートセンシング、および圧縮データから直接学習するために適しています。私たちは、画像分類、着色、文書理解、音楽ソース分離など、いくつかのタスクでWaLLoCの圧縮領域学習能力を実証します。コード、実験、事前学習済みオーディオおよび画像コーデックは、https://ut-sysml.org/walloc で入手可能です。

English

Modern sensors produce increasingly rich streams of high-resolution data. Due to resource constraints, machine learning systems discard the vast majority of this information via resolution reduction. Compressed-domain learning allows models to operate on compact latent representations, allowing higher effective resolution for the same budget. However, existing compression systems are not ideal for compressed learning. Linear transform coding and end-to-end learned compression systems reduce bitrate, but do not uniformly reduce dimensionality; thus, they do not meaningfully increase efficiency. Generative autoencoders reduce dimensionality, but their adversarial or perceptual objectives lead to significant information loss. To address these limitations, we introduce WaLLoC (Wavelet Learned Lossy Compression), a neural codec architecture that combines linear transform coding with nonlinear dimensionality-reducing autoencoders. WaLLoC sandwiches a shallow, asymmetric autoencoder and entropy bottleneck between an invertible wavelet packet transform. Across several key metrics, WaLLoC outperforms the autoencoders used in state-of-the-art latent diffusion models. WaLLoC does not require perceptual or adversarial losses to represent high-frequency detail, providing compatibility with modalities beyond RGB images and stereo audio. WaLLoC's encoder consists almost entirely of linear operations, making it exceptionally efficient and suitable for mobile computing, remote sensing, and learning directly from compressed data. We demonstrate WaLLoC's capability for compressed-domain learning across several tasks, including image classification, colorization, document understanding, and music source separation. Our code, experiments, and pre-trained audio and image codecs are available at https://ut-sysml.org/walloc

圧縮学習のための学習された圧縮

Learned Compression for Compressed Learning

要旨

Support