범용 정규 임베딩

초록

생성 모델과 비전 인코더는 서로 다른 목표에 최적화되고 상이한 수학적 원리에 기반하여 대체로 별개의 경로로 발전해왔습니다. 그러나 이들은 근본적인 속성을 공유하는데, 바로 잠재 공간의 가우시안성(Gaussianity)입니다. 생성 모델은 가우시안 노이즈를 이미지로 매핑하는 반면, 인코더는 이미지를 의미론적 임베딩으로 매핑하며, 이 임베딩의 좌표들은 경험적으로 가우시안 분포를 따르는 것으로 관찰됩니다. 우리는 이 두 가지가 공유 잠재 소스인 **범용 정규 임베딩(Universal Normal Embedding, UNE)** 의 서로 다른 관점이라고 가정합니다. UNE는 대략적으로 가우시안인 잠재 공간으로, 인코더 임베딩과 DDIM 역전파 노이즈가 잡음이 섞인 선형 투영으로부터 발생합니다. 우리 가설을 검증하기 위해, DDIM 역전파 확산 노이즈와 이에 대응하는 인코더 표현(CLIP, DINO)으로 구성된 이미지별 잠재 코드 데이터셋인 **NoiseZoo**를 소개합니다. CelebA 데이터셋에서 두 공간의 선형 탐사기(linear probe)는 강력하고 일관된 속성 예측 결과를 보여주며, 생성 노이즈가 선형 방향을 따라 의미 있는 의미론을 인코딩함을 시사합니다. 이러한 방향성을 이용하면 아키텍처 변경 없이도 신뢰할 수 있는 제어형 편집(예: 미소, 성별, 나이)이 가능하며, 간단한 직교화를 통해 불필요한 변수 간섭을 완화할 수 있습니다. 종합적으로, 우리의 결과는 UNE 가설에 대한 경험적 근거를 제공하고, 인코딩과 생성을 구체적으로 연결하는 공유된 가우시안형 잠재 기하학을 밝혀냅니다. 코드와 데이터는 https://rbetser.github.io/UNE/에서 이용 가능합니다.

English

Generative models and vision encoders have largely advanced on separate tracks, optimized for different goals and grounded in different mathematical principles. Yet, they share a fundamental property: latent space Gaussianity. Generative models map Gaussian noise to images, while encoders map images to semantic embeddings whose coordinates empirically behave as Gaussian. We hypothesize that both are views of a shared latent source, the Universal Normal Embedding (UNE): an approximately Gaussian latent space from which encoder embeddings and DDIM-inverted noise arise as noisy linear projections. To test our hypothesis, we introduce NoiseZoo, a dataset of per-image latents comprising DDIM-inverted diffusion noise and matching encoder representations (CLIP, DINO). On CelebA, linear probes in both spaces yield strong, aligned attribute predictions, indicating that generative noise encodes meaningful semantics along linear directions. These directions further enable faithful, controllable edits (e.g., smile, gender, age) without architectural changes, where simple orthogonalization mitigates spurious entanglements. Taken together, our results provide empirical support for the UNE hypothesis and reveal a shared Gaussian-like latent geometry that concretely links encoding and generation. Code and data are available https://rbetser.github.io/UNE/

범용 정규 임베딩

The Universal Normal Embedding

초록

Support