FASH-iCNN: Het inspecteerbaar maken van redactionele mode-identiteit via multimodale CNN-verkenning

Samenvatting

Mode-AI-systemen coderen routinematig de esthetische logica van specifieke modehuizen, redacteuren en historische momenten zonder deze openbaar te maken. Wij presenteren FASH-iCNN, een multimodaal systeem getraind op 87.547 Vogue-runwayfoto's van 15 modehuizen uit de periode 1991-2024, dat deze culturele logica inspecteerbaar maakt. Gegeven een foto van een kledingstuk, achterhaalt het systeem welk huis het produceerde, uit welk tijdperk het stamt en welke kleurtraditie het weerspiegelt. Een model dat uitsluitend kleding analyseert, identificeert het modehuis met een top-1 nauwkeurigheid van 78,2% over 14 huizen, het decennium met 88,6% top-1 en het specifieke jaar met 58,3% top-1 over 34 jaar, met een gemiddelde foutmarge van slechts 2,2 jaar. Onderzoek naar welke visuele kanalen dit signaal dragen, onthult een duidelijke dissociatie: het verwijderen van kleur kost slechts 10,6 procentpunt aan nauwkeurigheid van de huisidentiteit, terwijl het verwijderen van textuur 37,6 procentpunt kost, wat textuur en luminantie vestigt als de primaire dragers van redactionele identiteit. FASH-iCNN behandelt redactionele cultuur als het signaal in plaats van achtergrondruis, en identificeert welke huizen, tijdperken en kleurtradities elke output hebben gevormd, zodat gebruikers niet alleen kunnen zien wat het systeem voorspelt, maar ook welke modehuizen, redacteuren en historische momenten in die voorspelling zijn gecodeerd.

English

Fashion AI systems routinely encode the aesthetic logic of specific houses, editors, and historical moments without disclosing it. We present FASH-iCNN, a multimodal system trained on 87,547 Vogue runway images across 15 fashion houses spanning 1991-2024 that makes this cultural logic inspectable. Given a photograph of a garment, the system recovers which house produced it, which era it belongs to, and which color tradition it reflects. A clothing-only model identifies the fashion house at 78.2% top-1 across 14 houses, the decade at 88.6% top-1, and the specific year at 58.3% top-1 across 34 years with a mean error of just 2.2 years. Probing which visual channels carry this signal reveals a sharp dissociation: removing color costs only 10.6pp of house identity accuracy, while removing texture costs 37.6pp, establishing texture and luminance as the primary carriers of editorial identity. FASH-iCNN treats editorial culture as the signal rather than background noise, identifying which houses, eras, and color traditions shaped each output so that users can see not just what the system predicts but which houses, editors, and historical moments are encoded in that prediction.

FASH-iCNN: Het inspecteerbaar maken van redactionele mode-identiteit via multimodale CNN-verkenning

FASH-iCNN: Making Editorial Fashion Identity Inspectable Through Multimodal CNN Probing

Samenvatting

Support