FASH-iCNN：通过多模态CNN探析技术实现编辑时尚身份的可视化检测

摘要

時尚人工智能系統通常會對特定品牌、時尚編輯及歷史時期的審美邏輯進行編碼卻不予公開。我們提出FASH-iCNN多模態系統，該系統基於1991至2024年間15個時尚品牌的87,547張《Vogue》秀場圖像進行訓練，使這種文化邏輯可被檢視。當輸入服裝照片時，系統能識別其出品品牌、所屬時代及色彩傳統。純服裝模型在14個品牌中的品牌識別top-1準確率達78.2%，十年期時代識別準確率88.6%，34個年份的具體年份識別準確率58.3%（平均誤差僅2.2年）。通過探析視覺信號的傳導渠道發現明顯解離現象：移除色彩僅導致品牌識別準確率下降10.6個百分點，而移除紋理則損失37.6個百分點，證實紋理與亮度是編輯審美特徵的主要載體。FASH-iCNN將編輯文化視為核心信號而非背景噪聲，能標註影響每個輸出的具體品牌、時代與色彩傳統，使用戶不僅能看到系統預測結果，更能解讀編碼於預測中的品牌傳承、編輯理念與歷史印記。

English

Fashion AI systems routinely encode the aesthetic logic of specific houses, editors, and historical moments without disclosing it. We present FASH-iCNN, a multimodal system trained on 87,547 Vogue runway images across 15 fashion houses spanning 1991-2024 that makes this cultural logic inspectable. Given a photograph of a garment, the system recovers which house produced it, which era it belongs to, and which color tradition it reflects. A clothing-only model identifies the fashion house at 78.2% top-1 across 14 houses, the decade at 88.6% top-1, and the specific year at 58.3% top-1 across 34 years with a mean error of just 2.2 years. Probing which visual channels carry this signal reveals a sharp dissociation: removing color costs only 10.6pp of house identity accuracy, while removing texture costs 37.6pp, establishing texture and luminance as the primary carriers of editorial identity. FASH-iCNN treats editorial culture as the signal rather than background noise, identifying which houses, eras, and color traditions shaped each output so that users can see not just what the system predicts but which houses, editors, and historical moments are encoded in that prediction.

FASH-iCNN：通过多模态CNN探析技术实现编辑时尚身份的可视化检测

FASH-iCNN: Making Editorial Fashion Identity Inspectable Through Multimodal CNN Probing

摘要

Support