链式缩放：通过尺度自回归与偏好对齐实现极致超分辨率

摘要

现代单图像超分辨率（SISR）模型在训练时所针对的放大倍数下能够生成逼真的结果，但在要求其进行远超该范围的放大时却会失效。我们通过“链式缩放”（Chain-of-Zoom, CoZ）这一模型无关框架解决了这一可扩展性瓶颈，该框架将SISR分解为一系列自回归的中间尺度状态，并辅以多尺度感知提示。CoZ重复利用一个骨干SR模型，将条件概率分解为可处理的子问题，从而在不进行额外训练的情况下实现极端分辨率。由于在高倍放大下视觉线索会减弱，我们在每次缩放步骤中加入了由视觉语言模型（VLM）生成的多尺度感知文本提示。提示提取器本身通过广义奖励策略优化（GRPO）与一个评判VLM进行微调，使文本指导更符合人类偏好。实验表明，一个标准的4倍扩散SR模型在CoZ框架下实现了超过256倍的放大，同时保持了高感知质量和保真度。项目页面：https://bryanswkim.github.io/chain-of-zoom/。

English

Modern single-image super-resolution (SISR) models deliver photo-realistic results at the scale factors on which they are trained, but collapse when asked to magnify far beyond that regime. We address this scalability bottleneck with Chain-of-Zoom (CoZ), a model-agnostic framework that factorizes SISR into an autoregressive chain of intermediate scale-states with multi-scale-aware prompts. CoZ repeatedly re-uses a backbone SR model, decomposing the conditional probability into tractable sub-problems to achieve extreme resolutions without additional training. Because visual cues diminish at high magnifications, we augment each zoom step with multi-scale-aware text prompts generated by a vision-language model (VLM). The prompt extractor itself is fine-tuned using Generalized Reward Policy Optimization (GRPO) with a critic VLM, aligning text guidance towards human preference. Experiments show that a standard 4x diffusion SR model wrapped in CoZ attains beyond 256x enlargement with high perceptual quality and fidelity. Project Page: https://bryanswkim.github.io/chain-of-zoom/ .

链式缩放：通过尺度自回归与偏好对齐实现极致超分辨率

Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment

摘要

Support