ChatPaper.aiChatPaper

4KAgent:智能任意图像转4K超分辨率系统

4KAgent: Agentic Any Image to 4K Super-Resolution

July 9, 2025
作者: Yushen Zuo, Qi Zheng, Mingyang Wu, Xinrui Jiang, Renjie Li, Jian Wang, Yide Zhang, Gengchen Mai, Lihong V. Wang, James Zou, Xiaoyu Wang, Ming-Hsuan Yang, Zhengzhong Tu
cs.AI

摘要

我们推出4KAgent,一个统一的智能超分辨率通用系统,旨在将任何图像普遍提升至4K分辨率(若迭代应用,甚至可达到更高)。该系统能够将极低分辨率且严重退化的图像,例如高度失真的256x256输入,转化为清晰逼真的4K输出。4KAgent包含三大核心组件:(1) 分析模块,根据特定使用场景定制4KAgent的处理流程;(2) 感知代理,结合视觉-语言模型与图像质量评估专家,分析输入图像并制定个性化的修复方案;(3) 修复代理,执行该方案,遵循递归执行-反思范式,通过质量驱动的专家混合策略选择每一步的最优输出。此外,4KAgent还嵌入了专门的面部修复流程,显著提升人像和自拍照片中的面部细节。我们在涵盖26个多样化基准的11个不同任务类别中严格评估了4KAgent,在广泛的成像领域内确立了新的技术标杆。评估范围包括自然图像、人像照片、AI生成内容、卫星图像、荧光显微镜以及眼底摄影、超声波和X射线等医学影像,在感知质量(如NIQE、MUSIQ)和保真度(如PSNR)指标上均展现出卓越性能。通过为低级视觉任务建立一种新颖的智能范式,我们期望激发视觉中心自主智能体在跨学科研究社区中的广泛兴趣与创新。所有代码、模型及结果将发布于:https://4kagent.github.io。
English
We present 4KAgent, a unified agentic super-resolution generalist system designed to universally upscale any image to 4K resolution (and even higher, if applied iteratively). Our system can transform images from extremely low resolutions with severe degradations, for example, highly distorted inputs at 256x256, into crystal-clear, photorealistic 4K outputs. 4KAgent comprises three core components: (1) Profiling, a module that customizes the 4KAgent pipeline based on bespoke use cases; (2) A Perception Agent, which leverages vision-language models alongside image quality assessment experts to analyze the input image and make a tailored restoration plan; and (3) A Restoration Agent, which executes the plan, following a recursive execution-reflection paradigm, guided by a quality-driven mixture-of-expert policy to select the optimal output for each step. Additionally, 4KAgent embeds a specialized face restoration pipeline, significantly enhancing facial details in portrait and selfie photos. We rigorously evaluate our 4KAgent across 11 distinct task categories encompassing a total of 26 diverse benchmarks, setting new state-of-the-art on a broad spectrum of imaging domains. Our evaluations cover natural images, portrait photos, AI-generated content, satellite imagery, fluorescence microscopy, and medical imaging like fundoscopy, ultrasound, and X-ray, demonstrating superior performance in terms of both perceptual (e.g., NIQE, MUSIQ) and fidelity (e.g., PSNR) metrics. By establishing a novel agentic paradigm for low-level vision tasks, we aim to catalyze broader interest and innovation within vision-centric autonomous agents across diverse research communities. We will release all the code, models, and results at: https://4kagent.github.io.
PDF693July 10, 2025