智能摄影工匠：一体化推理、生成与优化的自动摄影图像编辑系统

摘要

传统摄影图像编辑通常要求用户具备足够的美学素养，才能为图像质量与相机参数的调整提供恰当指导。然而这种模式依赖于人类对美学意图的明确指示，而这类指示往往存在模糊性、不完整性，或非专业用户难以准确表达。本研究提出SmartPhotoCrafter——一种自动摄影图像编辑方法，将图像编辑构建为紧密耦合的推理到生成过程。该模型首先通过图像评审模块进行图像质量理解与缺陷识别，随后由摄影艺术家模块实现针对性编辑以提升图像吸引力，从而免除显式人工指导。我们采用分阶段训练流程：（一）通过基础预训练建立美学理解与编辑能力；（二）采用推理引导的多重编辑监督进行适应性训练，融入丰富语义指导；（三）通过协同式推理到生成的强化学习，联合优化推理与生成能力。训练过程中，SmartPhotoCrafter在实现照片级真实感图像生成的同时，兼顾图像修复与精修任务，并始终保持对色彩与影调语义的一致性遵循。我们还构建了分阶段专用数据集，逐步强化推理与可控生成能力、促进跨模块高效协作，最终实现高质量的摄影增强效果。实验表明，在自动摄影增强任务中，SmartPhotoCrafter优于现有生成模型，既能实现照片级真实效果，又对精修指令表现出更高的影调敏感度。项目页面：https://github.com/vivoCameraResearch/SmartPhotoCrafter。

English

Traditional photographic image editing typically requires users to possess sufficient aesthetic understanding to provide appropriate instructions for adjusting image quality and camera parameters. However, this paradigm relies on explicit human instruction of aesthetic intent, which is often ambiguous, incomplete, or inaccessible to non-expert users. In this work, we propose SmartPhotoCrafter, an automatic photographic image editing method which formulates image editing as a tightly coupled reasoning-to-generation process. The proposed model first performs image quality comprehension and identifies deficiencies by the Image Critic module, and then the Photographic Artist module realizes targeted edits to enhance image appeal, eliminating the need for explicit human instructions. A multi-stage training pipeline is adopted: (i) Foundation pretraining to establish basic aesthetic understanding and editing capabilities, (ii) Adaptation with reasoning-guided multi-edit supervision to incorporate rich semantic guidance, and (iii) Coordinated reasoning-to generation reinforcement learning to jointly optimize reasoning and generation. During training, SmartPhotoCrafter emphasizes photo-realistic image generation, while supporting both image restoration and retouching tasks with consistent adherence to color- and tone-related semantics. We also construct a stage-specific dataset, which progressively builds reasoning and controllable generation, effective cross-module collaboration, and ultimately high-quality photographic enhancement. Experiments demonstrate that SmartPhotoCrafter outperforms existing generative models on the task of automatic photographic enhancement, achieving photo-realistic results while exhibiting higher tonal sensitivity to retouching instructions. Project page: https://github.com/vivoCameraResearch/SmartPhotoCrafter.