擴散模型作為資料探勘工具

摘要

本文展示了如何將訓練用於圖像合成的生成模型作為視覺數據挖掘的工具。我們的洞察是，由於當代生成模型學習了其訓練數據的準確表示，我們可以利用它們通過挖掘視覺模式來對數據進行摘要。具體而言，我們展示了在對特定數據集進行圖像合成的條件擴散模型進行微調後，我們可以使用這些模型來定義該數據集上的典型性度量。該度量評估了不同數據標籤（如地理位置、時間戳記、語義標籤或甚至疾病存在）的視覺元素的典型性。這種通過合成進行數據挖掘的方法具有兩個關鍵優勢。首先，與傳統基於對應的方法相比，它的擴展性更好，因為它不需要明確比較所有視覺元素對。其次，儘管大多數先前關於視覺數據挖掘的工作都集中在單個數據集上，我們的方法可以處理在內容和規模上多樣的數據集，包括歷史汽車數據集、歷史人臉數據集、大規模全球街景數據集，甚至更大的場景數據集。此外，我們的方法允許在類標籤之間轉換視覺元素並分析一致的變化。

English

This paper demonstrates how to use generative models trained for image synthesis as tools for visual data mining. Our insight is that since contemporary generative models learn an accurate representation of their training data, we can use them to summarize the data by mining for visual patterns. Concretely, we show that after finetuning conditional diffusion models to synthesize images from a specific dataset, we can use these models to define a typicality measure on that dataset. This measure assesses how typical visual elements are for different data labels, such as geographic location, time stamps, semantic labels, or even the presence of a disease. This analysis-by-synthesis approach to data mining has two key advantages. First, it scales much better than traditional correspondence-based approaches since it does not require explicitly comparing all pairs of visual elements. Second, while most previous works on visual data mining focus on a single dataset, our approach works on diverse datasets in terms of content and scale, including a historical car dataset, a historical face dataset, a large worldwide street-view dataset, and an even larger scene dataset. Furthermore, our approach allows for translating visual elements across class labels and analyzing consistent changes.

擴散模型作為資料探勘工具

Diffusion Models as Data Mining Tools

摘要

Support