JarvisArt：通过智能照片润色助手释放人类艺术创造力

摘要

照片修饰已成为当代视觉叙事不可或缺的一部分，使用户能够捕捉美学并展现创意。尽管Adobe Lightroom等专业工具提供了强大的功能，但它们需要深厚的专业知识和大量手动操作。相比之下，现有的基于AI的解决方案虽实现了自动化，却常受限于可调节性不足和泛化能力差，难以满足多样化和个性化的编辑需求。为弥合这一差距，我们推出了JarvisArt，一个由多模态大语言模型（MLLM）驱动的智能体，它能理解用户意图，模仿专业艺术家的推理过程，并智能协调Lightroom中的200多种修饰工具。JarvisArt经历了两阶段训练：首先通过思维链监督微调建立基础推理和工具使用能力，随后采用面向修饰的群体相对策略优化（GRPO-R）进一步提升其决策制定和工具熟练度。我们还提出了Agent-to-Lightroom协议，以实现与Lightroom的无缝集成。为评估性能，我们开发了MMArt-Bench，一个基于真实用户编辑构建的新颖基准。JarvisArt展示了用户友好的交互、卓越的泛化能力以及对全局和局部调整的精细控制，为智能照片修饰开辟了新途径。值得注意的是，在MMArt-Bench上，JarvisArt在内容保真度的平均像素级指标上以60%的提升超越了GPT-4o，同时保持了相当的指令跟随能力。项目页面：https://jarvisart.vercel.app/。

English

Photo retouching has become integral to contemporary visual storytelling, enabling users to capture aesthetics and express creativity. While professional tools such as Adobe Lightroom offer powerful capabilities, they demand substantial expertise and manual effort. In contrast, existing AI-based solutions provide automation but often suffer from limited adjustability and poor generalization, failing to meet diverse and personalized editing needs. To bridge this gap, we introduce JarvisArt, a multi-modal large language model (MLLM)-driven agent that understands user intent, mimics the reasoning process of professional artists, and intelligently coordinates over 200 retouching tools within Lightroom. JarvisArt undergoes a two-stage training process: an initial Chain-of-Thought supervised fine-tuning to establish basic reasoning and tool-use skills, followed by Group Relative Policy Optimization for Retouching (GRPO-R) to further enhance its decision-making and tool proficiency. We also propose the Agent-to-Lightroom Protocol to facilitate seamless integration with Lightroom. To evaluate performance, we develop MMArt-Bench, a novel benchmark constructed from real-world user edits. JarvisArt demonstrates user-friendly interaction, superior generalization, and fine-grained control over both global and local adjustments, paving a new avenue for intelligent photo retouching. Notably, it outperforms GPT-4o with a 60% improvement in average pixel-level metrics on MMArt-Bench for content fidelity, while maintaining comparable instruction-following capabilities. Project Page: https://jarvisart.vercel.app/.

JarvisArt：通过智能照片润色助手释放人类艺术创造力

JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent

摘要

Support