ChatPaper.aiChatPaper

SmartPhotoCrafter:自动摄影图像编辑的统一推理、生成与优化框架

SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing

April 21, 2026
作者: Ying Zeng, Miaosen Luo, Guangyuan Li, Yang Yang, Ruiyang Fan, Linxiao Shi, Qirui Yang, Jian Zhang, Chengcheng Liu, Siming Zheng, Jinwei Chen, Bo Li, Peng-Tao Jiang
cs.AI

摘要

传统摄影图像编辑通常要求用户具备足够的美学理解能力,才能为调整图像质量和相机参数提供恰当指令。然而,这种范式依赖于人类对美学意图的显式指导,而这对非专业用户而言往往存在表述模糊、指令不完整或难以准确传达的问题。本研究提出SmartPhotoCrafter,一种自动化摄影图像编辑方法,将图像编辑构建为紧密耦合的推理到生成过程。该模型首先通过图像评审模块进行图像质量理解并识别缺陷,随后由摄影艺术家模块实现针对性编辑以提升图像吸引力,从而消除对显式人工指令的依赖。我们采用分阶段训练流程:(一)通过基础预训练建立基本美学理解与编辑能力;(二)通过推理引导的多重编辑监督进行适应性训练,融入丰富语义指导;(三)通过协调式推理到生成强化学习,联合优化推理与生成过程。训练过程中,SmartPhotoCrafter注重照片级真实感图像生成,同时支持图像修复与精修任务,并始终保持对色彩和影调相关语义的一致性遵循。我们还构建了分阶段专用数据集,逐步建立推理与可控生成能力、有效的跨模块协作机制,最终实现高质量的摄影增强效果。实验表明,在自动摄影增强任务中,SmartPhotoCrafter优于现有生成模型,既能实现照片级真实感效果,又对精修指令表现出更高的影调敏感度。项目页面:https://github.com/vivoCameraResearch/SmartPhotoCrafter。
English
Traditional photographic image editing typically requires users to possess sufficient aesthetic understanding to provide appropriate instructions for adjusting image quality and camera parameters. However, this paradigm relies on explicit human instruction of aesthetic intent, which is often ambiguous, incomplete, or inaccessible to non-expert users. In this work, we propose SmartPhotoCrafter, an automatic photographic image editing method which formulates image editing as a tightly coupled reasoning-to-generation process. The proposed model first performs image quality comprehension and identifies deficiencies by the Image Critic module, and then the Photographic Artist module realizes targeted edits to enhance image appeal, eliminating the need for explicit human instructions. A multi-stage training pipeline is adopted: (i) Foundation pretraining to establish basic aesthetic understanding and editing capabilities, (ii) Adaptation with reasoning-guided multi-edit supervision to incorporate rich semantic guidance, and (iii) Coordinated reasoning-to generation reinforcement learning to jointly optimize reasoning and generation. During training, SmartPhotoCrafter emphasizes photo-realistic image generation, while supporting both image restoration and retouching tasks with consistent adherence to color- and tone-related semantics. We also construct a stage-specific dataset, which progressively builds reasoning and controllable generation, effective cross-module collaboration, and ultimately high-quality photographic enhancement. Experiments demonstrate that SmartPhotoCrafter outperforms existing generative models on the task of automatic photographic enhancement, achieving photo-realistic results while exhibiting higher tonal sensitivity to retouching instructions. Project page: https://github.com/vivoCameraResearch/SmartPhotoCrafter.
PDF50April 23, 2026