主动学习超参数调研:基于大规模实验网格的深入洞察
Survey of Active Learning Hyperparameters: Insights from a Large-Scale Experimental Grid
June 4, 2025
作者: Julius Gonsior, Tim Rieß, Anja Reusch, Claudio Hartmann, Maik Thiele, Wolfgang Lehner
cs.AI
摘要
数据标注是一项耗时且成本高昂的任务,但却是监督式机器学习不可或缺的环节。主动学习(Active Learning, AL)作为一种成熟的方法,通过迭代选择最具信息量的未标注样本供专家标注,从而减少人工标注的工作量,并提升整体分类性能。尽管主动学习已存在数十年,但在实际应用中仍鲜见其身影。针对自然语言处理(NLP)领域的两项社区网络调查显示,阻碍实践者采用主动学习的两大主要原因在于:一是配置主动学习的复杂性,二是对其有效性的信任缺失。我们推测,这两大原因背后有着共同的症结:主动学习庞大的超参数空间。这一大多未被深入探索的超参数空间,往往导致实验结果误导性强且难以复现。在本研究中,我们首先构建了一个包含超过460万种超参数组合的大型网格,其次记录了迄今为止最大规模的主动学习研究中所有组合的表现,最后分析了各超参数对实验结果的影响。最终,我们针对每个超参数的影响给出了建议,揭示了具体主动学习策略实施方式带来的惊人影响,并设计了一套以最小计算成本实现可复现主动学习实验的研究方案,为未来开展更具可复现性和可信度的主动学习研究贡献力量。
English
Annotating data is a time-consuming and costly task, but it is inherently
required for supervised machine learning. Active Learning (AL) is an
established method that minimizes human labeling effort by iteratively
selecting the most informative unlabeled samples for expert annotation, thereby
improving the overall classification performance. Even though AL has been known
for decades, AL is still rarely used in real-world applications. As indicated
in the two community web surveys among the NLP community about AL, two main
reasons continue to hold practitioners back from using AL: first, the
complexity of setting AL up, and second, a lack of trust in its effectiveness.
We hypothesize that both reasons share the same culprit: the large
hyperparameter space of AL. This mostly unexplored hyperparameter space often
leads to misleading and irreproducible AL experiment results. In this study, we
first compiled a large hyperparameter grid of over 4.6 million hyperparameter
combinations, second, recorded the performance of all combinations in the
so-far biggest conducted AL study, and third, analyzed the impact of each
hyperparameter in the experiment results. In the end, we give recommendations
about the influence of each hyperparameter, demonstrate the surprising
influence of the concrete AL strategy implementation, and outline an
experimental study design for reproducible AL experiments with minimal
computational effort, thus contributing to more reproducible and trustworthy AL
research in the future.