ノイズは転移可能な知識を含む可能性がある：半教師あり異種ドメイン適応を実証的観点から理解する

要旨

半教師あり異種ドメイン適応（SHDA）は、異なる特徴表現と分布を持つドメイン間での学習を扱う手法であり、ソースサンプルはラベル付けされているが、ターゲットサンプルの大部分はラベルなしで、ごく一部のみがラベル付けされている。さらに、ソースサンプルとターゲットサンプルの間に一対一の対応関係はない。この問題に対処するためにさまざまなSHDA手法が開発されてきたが、異種ドメイン間で転移される知識の本質は依然として不明確である。本論文は、この疑問を実証的な観点から探求する。約330のSHDAタスクにおいて、2つの教師あり学習手法と7つの代表的なSHDA手法を用いて広範な実験を行った。驚くべきことに、ソースサンプルのカテゴリ情報や特徴情報は、ターゲットドメインの性能に大きな影響を与えないことが観察された。さらに、単純な分布から抽出されたノイズをソースサンプルとして使用した場合でも、転移可能な知識が含まれている可能性がある。この洞察に基づき、SHDAにおける転移可能な知識の基本原理を明らかにするために一連の実験を行った。具体的には、SHDAのための統一的な知識転移フレームワーク（KTF）を設計した。KTFに基づいて、SHDAにおける転移可能な知識は主にソースドメインの転移可能性と識別可能性に由来することがわかった。したがって、ソースサンプルの起源（例えば、画像、テキスト、ノイズ）に関わらず、これらの特性を確保することで、SHDAタスクにおける知識転移の効果を高めることができる。コードとデータセットはhttps://github.com/yyyaoyuan/SHDAで公開されている。

English

Semi-supervised heterogeneous domain adaptation (SHDA) addresses learning across domains with distinct feature representations and distributions, where source samples are labeled while most target samples are unlabeled, with only a small fraction labeled. Moreover, there is no one-to-one correspondence between source and target samples. Although various SHDA methods have been developed to tackle this problem, the nature of the knowledge transferred across heterogeneous domains remains unclear. This paper delves into this question from an empirical perspective. We conduct extensive experiments on about 330 SHDA tasks, employing two supervised learning methods and seven representative SHDA methods. Surprisingly, our observations indicate that both the category and feature information of source samples do not significantly impact the performance of the target domain. Additionally, noise drawn from simple distributions, when used as source samples, may contain transferable knowledge. Based on this insight, we perform a series of experiments to uncover the underlying principles of transferable knowledge in SHDA. Specifically, we design a unified Knowledge Transfer Framework (KTF) for SHDA. Based on the KTF, we find that the transferable knowledge in SHDA primarily stems from the transferability and discriminability of the source domain. Consequently, ensuring those properties in source samples, regardless of their origin (e.g., image, text, noise), can enhance the effectiveness of knowledge transfer in SHDA tasks. The codes and datasets are available at https://github.com/yyyaoyuan/SHDA.

ノイズは転移可能な知識を含む可能性がある：半教師あり異種ドメイン適応を実証的観点から理解する

Noise May Contain Transferable Knowledge: Understanding Semi-supervised Heterogeneous Domain Adaptation from an Empirical Perspective

要旨

Support