噪聲可能蘊含可遷移知識：從實證角度理解半監督異質領域適應

摘要

半監督異質域適應（SHDA）致力於解決特徵表示和分佈各異的跨域學習問題，其中源域樣本帶有標籤，而目標域樣本大多未標記，僅有少量被標註。此外，源域與目標域樣本之間並不存在一一對應關係。儘管已有多種SHDA方法被開發以應對此挑戰，但跨異質域傳遞的知識本質仍不明朗。本文從實證角度深入探討這一問題。我們在約330個SHDA任務上進行了廣泛實驗，採用了兩種監督學習方法和七種代表性SHDA方法。出乎意料的是，我們的觀察表明，源域樣本的類別信息和特徵信息對目標域性能的影響並不顯著。此外，從簡單分佈中抽取的噪聲，若作為源域樣本使用，可能蘊含可遷移的知識。基於這一發現，我們進行了一系列實驗以揭示SHDA中可遷移知識的內在規律。具體而言，我們設計了一個統一的SHDA知識遷移框架（KTF）。基於KTF，我們發現SHDA中的可遷移知識主要源於源域的可遷移性和可區分性。因此，確保源域樣本具備這些特性，無論其來源（如圖像、文本、噪聲），都能提升SHDA任務中知識遷移的效果。代碼和數據集已公開於https://github.com/yyyaoyuan/SHDA。

English

Semi-supervised heterogeneous domain adaptation (SHDA) addresses learning across domains with distinct feature representations and distributions, where source samples are labeled while most target samples are unlabeled, with only a small fraction labeled. Moreover, there is no one-to-one correspondence between source and target samples. Although various SHDA methods have been developed to tackle this problem, the nature of the knowledge transferred across heterogeneous domains remains unclear. This paper delves into this question from an empirical perspective. We conduct extensive experiments on about 330 SHDA tasks, employing two supervised learning methods and seven representative SHDA methods. Surprisingly, our observations indicate that both the category and feature information of source samples do not significantly impact the performance of the target domain. Additionally, noise drawn from simple distributions, when used as source samples, may contain transferable knowledge. Based on this insight, we perform a series of experiments to uncover the underlying principles of transferable knowledge in SHDA. Specifically, we design a unified Knowledge Transfer Framework (KTF) for SHDA. Based on the KTF, we find that the transferable knowledge in SHDA primarily stems from the transferability and discriminability of the source domain. Consequently, ensuring those properties in source samples, regardless of their origin (e.g., image, text, noise), can enhance the effectiveness of knowledge transfer in SHDA tasks. The codes and datasets are available at https://github.com/yyyaoyuan/SHDA.

噪聲可能蘊含可遷移知識：從實證角度理解半監督異質領域適應

Noise May Contain Transferable Knowledge: Understanding Semi-supervised Heterogeneous Domain Adaptation from an Empirical Perspective

摘要

Support