4KAgent: 임의의 이미지를 4K 초해상도로 변환하는 에이전트 시스템

초록

우리는 모든 이미지를 4K 해상도(반복적으로 적용할 경우 더 높은 해상도까지)로 보편적으로 업스케일할 수 있는 통합 에이전트 기반 초해상도 일반화 시스템인 4KAgent를 소개합니다. 우리의 시스템은 예를 들어 256x256 크기의 심각하게 왜곡된 입력과 같이 극도로 낮은 해상도와 심각한 열화를 겪은 이미지도 선명하고 사실적인 4K 출력으로 변환할 수 있습니다. 4KAgent는 세 가지 핵심 구성 요소로 이루어져 있습니다: (1) 특정 사용 사례에 맞춰 4KAgent 파이프라인을 커스터마이징하는 모듈인 프로파일링(Profiling), (2) 시각-언어 모델과 이미지 품질 평가 전문가를 활용하여 입력 이미지를 분석하고 맞춤형 복원 계획을 수립하는 인지 에이전트(Perception Agent), 그리고 (3) 품질 중심의 전문가 혼합 정책(Mixture-of-Expert Policy)을 통해 각 단계에서 최적의 출력을 선택하며, 재귀적 실행-반성 패러다임을 따르는 복원 에이전트(Restoration Agent). 또한, 4KAgent는 초상화 및 셀카 사진에서 얼굴 세부 사항을 크게 향상시키는 전용 얼굴 복원 파이프라인을 내장하고 있습니다. 우리는 4KAgent를 11개의 서로 다른 작업 범주와 총 26개의 다양한 벤치마크에 걸쳐 엄격히 평가하며, 광범위한 이미징 도메인에서 새로운 최첨단 기술을 수립했습니다. 우리의 평가는 자연 이미지, 초상화 사진, AI 생성 콘텐츠, 위성 이미지, 형광 현미경, 그리고 안저촬영, 초음파, X-선과 같은 의료 이미징을 포함하며, 지각적(예: NIQE, MUSIQ) 및 충실도(예: PSNR) 지표 모두에서 우수한 성능을 입증했습니다. 저수준 비전 작업을 위한 새로운 에이전트 패러다임을 수립함으로써, 우리는 다양한 연구 커뮤니티에서 비전 중심의 자율 에이전트에 대한 더 넓은 관심과 혁신을 촉진하고자 합니다. 모든 코드, 모델 및 결과는 https://4kagent.github.io에서 공개될 예정입니다.

English

We present 4KAgent, a unified agentic super-resolution generalist system designed to universally upscale any image to 4K resolution (and even higher, if applied iteratively). Our system can transform images from extremely low resolutions with severe degradations, for example, highly distorted inputs at 256x256, into crystal-clear, photorealistic 4K outputs. 4KAgent comprises three core components: (1) Profiling, a module that customizes the 4KAgent pipeline based on bespoke use cases; (2) A Perception Agent, which leverages vision-language models alongside image quality assessment experts to analyze the input image and make a tailored restoration plan; and (3) A Restoration Agent, which executes the plan, following a recursive execution-reflection paradigm, guided by a quality-driven mixture-of-expert policy to select the optimal output for each step. Additionally, 4KAgent embeds a specialized face restoration pipeline, significantly enhancing facial details in portrait and selfie photos. We rigorously evaluate our 4KAgent across 11 distinct task categories encompassing a total of 26 diverse benchmarks, setting new state-of-the-art on a broad spectrum of imaging domains. Our evaluations cover natural images, portrait photos, AI-generated content, satellite imagery, fluorescence microscopy, and medical imaging like fundoscopy, ultrasound, and X-ray, demonstrating superior performance in terms of both perceptual (e.g., NIQE, MUSIQ) and fidelity (e.g., PSNR) metrics. By establishing a novel agentic paradigm for low-level vision tasks, we aim to catalyze broader interest and innovation within vision-centric autonomous agents across diverse research communities. We will release all the code, models, and results at: https://4kagent.github.io.

4KAgent: 임의의 이미지를 4K 초해상도로 변환하는 에이전트 시스템

4KAgent: Agentic Any Image to 4K Super-Resolution

초록

Support