ChatPaper.aiChatPaper

PHI-S:用于无标签多教师蒸馏的分布平衡

PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation

October 2, 2024
作者: Mike Ranzinger, Jon Barker, Greg Heinrich, Pavlo Molchanov, Bryan Catanzaro, Andrew Tao
cs.AI

摘要

各种视觉基础模型具有明显的优势和劣势,这两者都可以通过异构多教师无标签知识蒸馏来改进,称为“聚合模型”。我们在这一研究基础上,研究了教师激活统计数据的影响,特别是损失函数对最终学生模型质量的影响。我们探讨了一套标准的统计规范化技术,以更好地调整不同分布并评估其影响。此外,我们研究了对下游教师匹配度量的影响,这促使我们使用哈达玛矩阵。通过这些矩阵,我们展示了它们的有用特性,展示了它们如何用于各向同性标准化,其中多变量分布的每个维度都使用相同的尺度进行标准化。我们将这一技术称为“PHI标准化”(PHI-S),并在实证中证明它在所研究方法组中产生了最佳的学生模型。
English
Various visual foundation models have distinct strengths and weaknesses, both of which can be improved through heterogeneous multi-teacher knowledge distillation without labels, termed "agglomerative models." We build upon this body of work by studying the effect of the teachers' activation statistics, particularly the impact of the loss function on the resulting student model quality. We explore a standard toolkit of statistical normalization techniques to better align the different distributions and assess their effects. Further, we examine the impact on downstream teacher-matching metrics, which motivates the use of Hadamard matrices. With these matrices, we demonstrate useful properties, showing how they can be used for isotropic standardization, where each dimension of a multivariate distribution is standardized using the same scale. We call this technique "PHI Standardization" (PHI-S) and empirically demonstrate that it produces the best student model across the suite of methods studied.

Summary

AI-Generated Summary

PDF364November 16, 2024