FASH-iCNN: Überprüfbarkeit redaktioneller Modeidentität durch multimodale CNN-Analyse

Zusammenfassung

KI-Systeme in der Modebranche kodieren routinemäßig die ästhetische Logik bestimmter Modehäuser, Redaktionen und historischer Epochen, ohne diese offenzulegen. Wir stellen FASH-iCNN vor, ein multimodales System, das an 87.547 Vogue-Laufstegbildern von 15 Modehäusern aus den Jahren 1991–2024 trainiert wurde und diese kulturelle Logik überprüfbar macht. Anhand eines Fotos eines Kleidungsstücks ermittelt das System, von welchem Haus es produziert wurde, welcher Ära es zuzuordnen ist und welche Farbtradition es widerspiegelt. Ein rein auf Kleidung basierendes Modell identifiziert das Modehaus mit einer Top-1-Genauigkeit von 78,2 % über 14 Häuser, das Jahrzehnt mit 88,6 % Top-1 und das konkrete Jahr mit 58,3 % Top-1 über 34 Jahre hinweg, mit einem durchschnittlichen Fehler von nur 2,2 Jahren. Die Untersuchung, welche visuellen Kanäle dieses Signal tragen, zeigt eine deutliche Dissoziation: Das Entfernen von Farbe kostet nur 10,6 Prozentpunkte an Genauigkeit bei der Hausidentität, während das Entfernen von Textur 37,6 Prozentpunkte kostet. Dies bestätigt Textur und Helligkeit als primäre Träger der redaktionellen Identität. FASH-iCNN behandelt die Redaktionskultur als Signal statt als Hintergrundrauschen und identifiziert, welche Häuser, Epochen und Farbtraditionen jedes Ergebnis geprägt haben, sodass Nutzer nicht nur sehen, was das System vorhersagt, sondern auch, welche Häuser, Redakteure und historischen Momente in dieser Vorhersage kodiert sind.

English

Fashion AI systems routinely encode the aesthetic logic of specific houses, editors, and historical moments without disclosing it. We present FASH-iCNN, a multimodal system trained on 87,547 Vogue runway images across 15 fashion houses spanning 1991-2024 that makes this cultural logic inspectable. Given a photograph of a garment, the system recovers which house produced it, which era it belongs to, and which color tradition it reflects. A clothing-only model identifies the fashion house at 78.2% top-1 across 14 houses, the decade at 88.6% top-1, and the specific year at 58.3% top-1 across 34 years with a mean error of just 2.2 years. Probing which visual channels carry this signal reveals a sharp dissociation: removing color costs only 10.6pp of house identity accuracy, while removing texture costs 37.6pp, establishing texture and luminance as the primary carriers of editorial identity. FASH-iCNN treats editorial culture as the signal rather than background noise, identifying which houses, eras, and color traditions shaped each output so that users can see not just what the system predicts but which houses, editors, and historical moments are encoded in that prediction.

FASH-iCNN: Überprüfbarkeit redaktioneller Modeidentität durch multimodale CNN-Analyse

FASH-iCNN: Making Editorial Fashion Identity Inspectable Through Multimodal CNN Probing

Zusammenfassung

Support