실세계 이미지 초해상도를 위한 확산 모델 사전 지식 활용

초록

우리는 사전 학습된 텍스트-이미지 확산 모델에 캡슐화된 사전 지식을 활용하여 블라인드 초해상도(SR)를 수행하는 새로운 접근 방식을 제안합니다. 구체적으로, 우리의 시간 인식 인코더를 사용하면 사전 학습된 합성 모델을 변경하지 않고도 유망한 복원 결과를 달성할 수 있어, 생성적 사전 지식을 보존하고 훈련 비용을 최소화할 수 있습니다. 확산 모델의 고유한 확률적 특성으로 인한 충실도 손실을 보완하기 위해, 우리는 추론 과정에서 단순히 스칼라 값을 조정하여 품질과 충실도 간의 균형을 맞출 수 있는 제어 가능한 특징 래핑 모듈을 도입했습니다. 또한, 사전 학습된 확산 모델의 고정 크기 제약을 극복하기 위해 점진적 집계 샘플링 전략을 개발하여 임의의 크기의 해상도에 적응할 수 있도록 했습니다. 합성 및 실제 벤치마크를 사용한 우리 방법의 포괄적인 평가는 현재 최첨단 접근 방식보다 우수함을 입증합니다.

English

We present a novel approach to leverage prior knowledge encapsulated in pre-trained text-to-image diffusion models for blind super-resolution (SR). Specifically, by employing our time-aware encoder, we can achieve promising restoration results without altering the pre-trained synthesis model, thereby preserving the generative prior and minimizing training cost. To remedy the loss of fidelity caused by the inherent stochasticity of diffusion models, we introduce a controllable feature wrapping module that allows users to balance quality and fidelity by simply adjusting a scalar value during the inference process. Moreover, we develop a progressive aggregation sampling strategy to overcome the fixed-size constraints of pre-trained diffusion models, enabling adaptation to resolutions of any size. A comprehensive evaluation of our method using both synthetic and real-world benchmarks demonstrates its superiority over current state-of-the-art approaches.

실세계 이미지 초해상도를 위한 확산 모델 사전 지식 활용

Exploiting Diffusion Prior for Real-World Image Super-Resolution

초록

Support