Comprensione delle capacità dell'IA generativa nei compiti quotidiani di editing delle immagini

Abstract

L'intelligenza artificiale generativa (GenAI) offre un potenziale significativo per automatizzare le attività quotidiane di editing delle immagini, specialmente dopo il recente lancio di GPT-4o il 25 marzo 2025. Tuttavia, quali sono i soggetti che le persone desiderano modificare più frequentemente? Quali tipi di azioni di editing vogliono eseguire (ad esempio, rimuovere o stilizzare il soggetto)? Le persone preferiscono modifiche precise con risultati prevedibili o altamente creative? Comprendendo le caratteristiche delle richieste del mondo reale e le corrispondenti modifiche effettuate da esperti freelance di fotoritocco, possiamo trarre insegnamenti per migliorare gli editor basati su IA e determinare quali tipi di richieste possono attualmente essere gestite con successo dagli editor IA? In questo articolo, presentiamo uno studio unico che affronta queste domande analizzando 83k richieste degli ultimi 12 anni (2013-2025) sulla comunità Reddit, che ha raccolto 305k modifiche PSR-wizard. Secondo le valutazioni umane, solo circa il 33% delle richieste può essere soddisfatto dai migliori editor IA (inclusi GPT-4o, Gemini-2.0-Flash, SeedEdit). È interessante notare che gli editor IA performano peggio su richieste a bassa creatività che richiedono un editing preciso rispetto a compiti più aperti. Spesso faticano a preservare l'identità di persone e animali e frequentemente effettuano ritocchi non richiesti. Dall'altro lato, i giudici VLM (ad esempio, o1) si comportano diversamente dai giudici umani e potrebbero preferire le modifiche IA rispetto a quelle umane. Codice ed esempi qualitativi sono disponibili al seguente link: https://psrdataset.github.io

English

Generative AI (GenAI) holds significant promise for automating everyday image editing tasks, especially following the recent release of GPT-4o on March 25, 2025. However, what subjects do people most often want edited? What kinds of editing actions do they want to perform (e.g., removing or stylizing the subject)? Do people prefer precise edits with predictable outcomes or highly creative ones? By understanding the characteristics of real-world requests and the corresponding edits made by freelance photo-editing wizards, can we draw lessons for improving AI-based editors and determine which types of requests can currently be handled successfully by AI editors? In this paper, we present a unique study addressing these questions by analyzing 83k requests from the past 12 years (2013-2025) on the Reddit community, which collected 305k PSR-wizard edits. According to human ratings, approximately only 33% of requests can be fulfilled by the best AI editors (including GPT-4o, Gemini-2.0-Flash, SeedEdit). Interestingly, AI editors perform worse on low-creativity requests that require precise editing than on more open-ended tasks. They often struggle to preserve the identity of people and animals, and frequently make non-requested touch-ups. On the other side of the table, VLM judges (e.g., o1) perform differently from human judges and may prefer AI edits more than human edits. Code and qualitative examples are available at: https://psrdataset.github.io

Comprensione delle capacità dell'IA generativa nei compiti quotidiani di editing delle immagini

Understanding Generative AI Capabilities in Everyday Image Editing Tasks

Abstract

Support