ChatPaper.aiChatPaper

4KAgent:任意圖像至4K超解析度的智能代理

4KAgent: Agentic Any Image to 4K Super-Resolution

July 9, 2025
作者: Yushen Zuo, Qi Zheng, Mingyang Wu, Xinrui Jiang, Renjie Li, Jian Wang, Yide Zhang, Gengchen Mai, Lihong V. Wang, James Zou, Xiaoyu Wang, Ming-Hsuan Yang, Zhengzhong Tu
cs.AI

摘要

我們推出4KAgent,這是一個統一的代理式超分辨率通用系統,旨在將任何圖像普遍提升至4K分辨率(若迭代應用,甚至可達更高)。我們的系統能夠將極低分辨率且嚴重退化的圖像,例如高度失真的256x256輸入,轉化為清晰逼真的4K輸出。4KAgent包含三個核心組件:(1) 分析模塊,根據特定使用案例定制4KAgent流程;(2) 感知代理,利用視覺-語言模型與圖像質量評估專家分析輸入圖像並制定個性化修復計劃;(3) 修復代理,執行該計劃,遵循遞歸執行-反思範式,並由質量驅動的專家混合策略指導,以選擇每一步的最佳輸出。此外,4KAgent內置專門的人臉修復流程,顯著增強肖像和自拍照片中的面部細節。我們在涵蓋26個多樣化基準的11個不同任務類別中嚴格評估了4KAgent,在多種成像領域中創下了新的技術標準。我們的評估範圍包括自然圖像、肖像照片、AI生成內容、衛星圖像、熒光顯微鏡以及如眼底鏡、超聲波和X射線等醫學影像,在感知(如NIQE、MUSIQ)和保真度(如PSNR)指標上均展現出卓越性能。通過為低層次視覺任務建立新的代理範式,我們旨在激發跨多樣研究領域對視覺中心自主代理的更廣泛興趣與創新。我們將在https://4kagent.github.io發布所有代碼、模型及結果。
English
We present 4KAgent, a unified agentic super-resolution generalist system designed to universally upscale any image to 4K resolution (and even higher, if applied iteratively). Our system can transform images from extremely low resolutions with severe degradations, for example, highly distorted inputs at 256x256, into crystal-clear, photorealistic 4K outputs. 4KAgent comprises three core components: (1) Profiling, a module that customizes the 4KAgent pipeline based on bespoke use cases; (2) A Perception Agent, which leverages vision-language models alongside image quality assessment experts to analyze the input image and make a tailored restoration plan; and (3) A Restoration Agent, which executes the plan, following a recursive execution-reflection paradigm, guided by a quality-driven mixture-of-expert policy to select the optimal output for each step. Additionally, 4KAgent embeds a specialized face restoration pipeline, significantly enhancing facial details in portrait and selfie photos. We rigorously evaluate our 4KAgent across 11 distinct task categories encompassing a total of 26 diverse benchmarks, setting new state-of-the-art on a broad spectrum of imaging domains. Our evaluations cover natural images, portrait photos, AI-generated content, satellite imagery, fluorescence microscopy, and medical imaging like fundoscopy, ultrasound, and X-ray, demonstrating superior performance in terms of both perceptual (e.g., NIQE, MUSIQ) and fidelity (e.g., PSNR) metrics. By establishing a novel agentic paradigm for low-level vision tasks, we aim to catalyze broader interest and innovation within vision-centric autonomous agents across diverse research communities. We will release all the code, models, and results at: https://4kagent.github.io.
PDF683July 10, 2025