FASH-iCNN：通过多模态CNN探析实现编辑时尚身份的可视化解析

摘要

时尚AI系统通常在不公开披露的情况下，编码特定品牌、编辑和历史时期的美学逻辑。我们推出FASH-iCNN多模态系统，通过基于1991-2024年间15个时尚品牌的87,547张《Vogue》秀场图像训练，使这种文化逻辑变得可追溯。该系统能根据服装照片识别其所属品牌、时代归属及色彩传统。纯服装模型在14个品牌中的品牌识别准确率达78.2%（Top-1），年代识别准确率88.6%（Top-1），34个年份中的具体年份识别准确率58.3%（Top-1），平均误差仅2.2年。通过探究视觉通道的信号承载机制发现显著解离现象：去除色彩仅导致品牌识别准确率下降10.6个百分点，而去除纹理则造成37.6个百分点的损失，证实纹理与亮度是编辑风格的主要载体。FASH-iCNN将编辑文化视作核心信号而非背景噪声，通过标识影响每个输出结果的品牌、时代与色彩传统，使用户不仅能获取系统预测结果，更能洞察编码于预测中的品牌基因、编辑理念与历史印记。

English

Fashion AI systems routinely encode the aesthetic logic of specific houses, editors, and historical moments without disclosing it. We present FASH-iCNN, a multimodal system trained on 87,547 Vogue runway images across 15 fashion houses spanning 1991-2024 that makes this cultural logic inspectable. Given a photograph of a garment, the system recovers which house produced it, which era it belongs to, and which color tradition it reflects. A clothing-only model identifies the fashion house at 78.2% top-1 across 14 houses, the decade at 88.6% top-1, and the specific year at 58.3% top-1 across 34 years with a mean error of just 2.2 years. Probing which visual channels carry this signal reveals a sharp dissociation: removing color costs only 10.6pp of house identity accuracy, while removing texture costs 37.6pp, establishing texture and luminance as the primary carriers of editorial identity. FASH-iCNN treats editorial culture as the signal rather than background noise, identifying which houses, eras, and color traditions shaped each output so that users can see not just what the system predicts but which houses, editors, and historical moments are encoded in that prediction.

FASH-iCNN：通过多模态CNN探析实现编辑时尚身份的可视化解析

FASH-iCNN: Making Editorial Fashion Identity Inspectable Through Multimodal CNN Probing

摘要

Support