SmartPhotoCrafter：自動写真画像編集のための統合的推論、生成、最適化

要旨

従来の写真画像編集では、ユーザーが画像品質やカメラパラメータを適切に調整するための指示を行うために、十分な美的理解を備えていることが求められてきた。しかし、このパラダイムは、美的意図の明示的な人間による指示に依存しており、その指示は曖昧であったり、不完全であったり、非専門家のユーザーには理解が困難であったりすることが多い。本研究では、画像編集を密結合な推論から生成へのプロセスとして定式化する自動写真画像編集手法、SmartPhotoCrafterを提案する。提案モデルはまず、Image Criticモジュールによる画像品質の理解と欠陥の特定を行い、次にPhotographic Artistモジュールが画像の魅力を高めるための標的編集を実現し、明示的な人間の指示を不要とする。マルチステージの学習パイプラインを採用する：(i) 基礎的な美的理解と編集能力を確立するファウンデーション事前学習、(ii) 豊富な意味的ガイダンスを組み込むための推論誘導型マルチ編集監督による適応、(iii) 推論と生成を共同で最適化する協調的推論から生成への強化学習。学習において、SmartPhotoCrafterは写真写実的な画像生成を重視しつつ、画像修復とレタッチの両タスクをサポートし、色調関連の意味論への一貫した準拠を実現する。また、推論と制御可能な生成、効果的なモジュール間連携、そして最終的に高品質な写真強調を段階的に構築する、ステージ特化型データセットを構築した。実験により、SmartPhotoCrafterは自動写真強調タスクにおいて既存の生成モデルを凌駕し、写真写実的な結果を達成するとともに、レタッチ指示に対するより高い色調感度を示すことを実証した。プロジェクトページ: https://github.com/vivoCameraResearch/SmartPhotoCrafter。

English

Traditional photographic image editing typically requires users to possess sufficient aesthetic understanding to provide appropriate instructions for adjusting image quality and camera parameters. However, this paradigm relies on explicit human instruction of aesthetic intent, which is often ambiguous, incomplete, or inaccessible to non-expert users. In this work, we propose SmartPhotoCrafter, an automatic photographic image editing method which formulates image editing as a tightly coupled reasoning-to-generation process. The proposed model first performs image quality comprehension and identifies deficiencies by the Image Critic module, and then the Photographic Artist module realizes targeted edits to enhance image appeal, eliminating the need for explicit human instructions. A multi-stage training pipeline is adopted: (i) Foundation pretraining to establish basic aesthetic understanding and editing capabilities, (ii) Adaptation with reasoning-guided multi-edit supervision to incorporate rich semantic guidance, and (iii) Coordinated reasoning-to generation reinforcement learning to jointly optimize reasoning and generation. During training, SmartPhotoCrafter emphasizes photo-realistic image generation, while supporting both image restoration and retouching tasks with consistent adherence to color- and tone-related semantics. We also construct a stage-specific dataset, which progressively builds reasoning and controllable generation, effective cross-module collaboration, and ultimately high-quality photographic enhancement. Experiments demonstrate that SmartPhotoCrafter outperforms existing generative models on the task of automatic photographic enhancement, achieving photo-realistic results while exhibiting higher tonal sensitivity to retouching instructions. Project page: https://github.com/vivoCameraResearch/SmartPhotoCrafter.

SmartPhotoCrafter：自動写真画像編集のための統合的推論、生成、最適化

SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing

要旨

Support