연속적 적대적 흐름 모델

초록

본 논문에서는 적대적 목적함수로 학습된 연속시간 플로우 모델인 연속 적대적 플로우 모델을 제안한다. 고정된 평균 제곱 오차 기준을 사용하는 플로우 매칭과 달리, 우리의 접근법은 학습을 안내하기 위해 학습된 판별자를 도입한다. 이러한 목적함수의 변화는 실험적으로 대상 데이터 분포와 더 잘 일치하는 샘플을 생성하는 서로 다른 일반화된 분포를 유도한다. 우리의 방법은 기존 플로우 매칭 모델의 사후 학습을 위해 주로 제안되지만, 처음부터 모델을 학습시키는 데에도 사용할 수 있다. ImageNet 256px 생성 과제에서 우리의 사후 학습은 잠재 공간 SiT의 지도 없음(unconditional) FID를 8.26에서 3.63으로, 픽셀 공간 JiT의 FID를 7.17에서 3.57으로 크게 개선했다. 또한 지도 생성(guided generation) 성능도 향상시켜, SiT의 FID는 2.06에서 1.53으로, JiT의 FID는 1.86에서 1.80으로 감소시켰다. 텍스트-이미지 생성에서 우리의 접근법을 추가로 평가한 결과, GenEval 및 DPG 벤치마크 모두에서 향상된 결과를 달성했다.

English

We propose continuous adversarial flow models, a type of continuous-time flow model trained with an adversarial objective. Unlike flow matching, which uses a fixed mean-squared-error criterion, our approach introduces a learned discriminator to guide training. This change in objective induces a different generalized distribution, which empirically produces samples that are better aligned with the target data distribution. Our method is primarily proposed for post-training existing flow-matching models, although it can also train models from scratch. On the ImageNet 256px generation task, our post-training substantially improves the guidance-free FID of latent-space SiT from 8.26 to 3.63 and of pixel-space JiT from 7.17 to 3.57. It also improves guided generation, reducing FID from 2.06 to 1.53 for SiT and from 1.86 to 1.80 for JiT. We further evaluate our approach on text-to-image generation, where it achieves improved results on both the GenEval and DPG benchmarks.

연속적 적대적 흐름 모델

Continuous Adversarial Flow Models

초록

Support