스코어 기반 생성 모델을 활용한 고품질 이미지 압축

초록

텍스트-이미지 생성 분야에서 확산 생성 모델의 엄청난 성공에도 불구하고, 이를 이미지 압축 영역에서 재현하는 것은 어려운 과제로 남아 있었습니다. 본 논문에서는 확산 모델이 주어진 비트레이트에서 지각적 품질을 크게 향상시킬 수 있음을 보여주며, FID 점수 측면에서 최신 기술인 PO-ELIC 및 HiFiC 접근법을 능가함을 입증합니다. 이는 MSE를 목표로 하는 오토인코더와 추가적인 스코어 기반 디코더를 결합한 간단하지만 이론적으로 타당한 2단계 접근법을 통해 달성되었습니다. 그러나 구현 세부 사항이 중요하며, 최적의 설계 결정은 일반적인 텍스트-이미지 모델과 크게 다를 수 있음을 보여줍니다.

English

Despite the tremendous success of diffusion generative models in text-to-image generation, replicating this success in the domain of image compression has proven difficult. In this paper, we demonstrate that diffusion can significantly improve perceptual quality at a given bit-rate, outperforming state-of-the-art approaches PO-ELIC and HiFiC as measured by FID score. This is achieved using a simple but theoretically motivated two-stage approach combining an autoencoder targeting MSE followed by a further score-based decoder. However, as we will show, implementation details matter and the optimal design decisions can differ greatly from typical text-to-image models.

스코어 기반 생성 모델을 활용한 고품질 이미지 압축

High-Fidelity Image Compression with Score-based Generative Models

초록

Support