ChatPaper.aiChatPaper

理解生成式AI在日常图像编辑任务中的能力

Understanding Generative AI Capabilities in Everyday Image Editing Tasks

May 22, 2025
作者: Mohammad Reza Taesiri, Brandon Collins, Logan Bolton, Viet Dac Lai, Franck Dernoncourt, Trung Bui, Anh Totti Nguyen
cs.AI

摘要

生成式人工智能(GenAI)在自动化日常图像编辑任务方面展现出巨大潜力,尤其是在2025年3月25日GPT-4o发布之后。然而,人们最常希望编辑的主题是什么?他们希望执行哪些类型的编辑操作(例如,移除或风格化主体)?人们更倾向于可预测结果的精确编辑,还是高度创意的编辑?通过理解现实世界中的编辑请求特征以及自由职业照片编辑高手所做出的相应编辑,我们能否为改进基于AI的编辑器汲取经验,并确定当前AI编辑器能够成功处理哪些类型的请求?在本篇论文中,我们通过分析Reddit社区过去12年(2013-2025)收集的83,000条请求及对应的305,000次PSR高手编辑,开展了一项独特的研究来解答这些问题。根据人类评分,仅有约33%的请求能够被最佳AI编辑器(包括GPT-4o、Gemini-2.0-Flash、SeedEdit)满足。有趣的是,AI编辑器在需要精确编辑的低创意请求上表现不如在开放任务上。它们往往难以保持人物和动物的身份特征,并频繁进行非请求的修饰。另一方面,视觉语言模型(VLM)评判者(如o1)与人类评判者的表现不同,可能更偏好AI编辑而非人类编辑。代码及定性示例可访问:https://psrdataset.github.io。
English
Generative AI (GenAI) holds significant promise for automating everyday image editing tasks, especially following the recent release of GPT-4o on March 25, 2025. However, what subjects do people most often want edited? What kinds of editing actions do they want to perform (e.g., removing or stylizing the subject)? Do people prefer precise edits with predictable outcomes or highly creative ones? By understanding the characteristics of real-world requests and the corresponding edits made by freelance photo-editing wizards, can we draw lessons for improving AI-based editors and determine which types of requests can currently be handled successfully by AI editors? In this paper, we present a unique study addressing these questions by analyzing 83k requests from the past 12 years (2013-2025) on the Reddit community, which collected 305k PSR-wizard edits. According to human ratings, approximately only 33% of requests can be fulfilled by the best AI editors (including GPT-4o, Gemini-2.0-Flash, SeedEdit). Interestingly, AI editors perform worse on low-creativity requests that require precise editing than on more open-ended tasks. They often struggle to preserve the identity of people and animals, and frequently make non-requested touch-ups. On the other side of the table, VLM judges (e.g., o1) perform differently from human judges and may prefer AI edits more than human edits. Code and qualitative examples are available at: https://psrdataset.github.io

Summary

AI-Generated Summary

PDF202May 23, 2025