主動學習超參數綜述:來自大規模實驗網格的洞見
Survey of Active Learning Hyperparameters: Insights from a Large-Scale Experimental Grid
June 4, 2025
作者: Julius Gonsior, Tim Rieß, Anja Reusch, Claudio Hartmann, Maik Thiele, Wolfgang Lehner
cs.AI
摘要
數據標註是一項耗時且成本高昂的任務,但這卻是監督式機器學習中不可或缺的環節。主動學習(Active Learning, AL)作為一種成熟的方法,通過迭代選擇最具信息量的未標註樣本供專家進行標註,從而最大限度地減少人工標註的工作量,並提升整體分類性能。儘管AL已存在數十年,但在實際應用中仍鮮見其身影。根據在NLP社群中進行的兩次關於AL的網絡調查顯示,阻礙實踐者使用AL的主要原因有二:一是AL設置的複雜性,二是對其有效性的信任不足。我們假設這兩大原因背後存在同一個根源:AL龐大的超參數空間。這一大多未被探索的超參數空間,往往導致誤導性且不可重現的AL實驗結果。在本研究中,我們首先構建了一個包含超過460萬種超參數組合的大型網格,其次記錄了迄今為止最大規模AL研究中所有組合的表現,最後分析了各超參數對實驗結果的影響。最終,我們針對每個超參數的影響給出了建議,展示了具體AL策略實現的驚人影響力,並勾勒出一種以最小計算成本實現可重現AL實驗的研究設計,從而為未來更可重現且值得信賴的AL研究做出貢獻。
English
Annotating data is a time-consuming and costly task, but it is inherently
required for supervised machine learning. Active Learning (AL) is an
established method that minimizes human labeling effort by iteratively
selecting the most informative unlabeled samples for expert annotation, thereby
improving the overall classification performance. Even though AL has been known
for decades, AL is still rarely used in real-world applications. As indicated
in the two community web surveys among the NLP community about AL, two main
reasons continue to hold practitioners back from using AL: first, the
complexity of setting AL up, and second, a lack of trust in its effectiveness.
We hypothesize that both reasons share the same culprit: the large
hyperparameter space of AL. This mostly unexplored hyperparameter space often
leads to misleading and irreproducible AL experiment results. In this study, we
first compiled a large hyperparameter grid of over 4.6 million hyperparameter
combinations, second, recorded the performance of all combinations in the
so-far biggest conducted AL study, and third, analyzed the impact of each
hyperparameter in the experiment results. In the end, we give recommendations
about the influence of each hyperparameter, demonstrate the surprising
influence of the concrete AL strategy implementation, and outline an
experimental study design for reproducible AL experiments with minimal
computational effort, thus contributing to more reproducible and trustworthy AL
research in the future.