履歷地圖:利用大規模數據集和大型語言模型重新審視履歷分類
ResumeAtlas: Revisiting Resume Classification with Large-Scale Datasets and Large Language Models
June 26, 2024
作者: Ahmed Heakl, Youssef Mohamed, Noran Mohamed, Ali Sharkaway, Ahmed Zaky
cs.AI
摘要
隨著對線上招聘平台的日益依賴以及人工智慧技術的採用,強調了對高效履歷分類方法的迫切需求。然而,挑戰如小數據集、缺乏標準化履歷模板和隱私問題阻礙了現有分類模型的準確性和效力。在這項研究中,我們通過提出一種全面的履歷分類方法來應對這些挑戰。我們從不同來源精心挑選了一個規模龐大的數據集,包含13,389份履歷,並採用了大型語言模型(LLMs)如BERT和Gemma1.1 2B進行分類。我們的結果顯示,相較於傳統機器學習方法,我們的最佳模型實現了92%的頂級1準確度和97.5%的頂級5準確度。這些發現強調了數據集質量和先進模型架構在提升履歷分類系統的準確性和韌性方面的重要性,從而推動了線上招聘實踐領域的發展。
English
The increasing reliance on online recruitment platforms coupled with the
adoption of AI technologies has highlighted the critical need for efficient
resume classification methods. However, challenges such as small datasets, lack
of standardized resume templates, and privacy concerns hinder the accuracy and
effectiveness of existing classification models. In this work, we address these
challenges by presenting a comprehensive approach to resume classification. We
curated a large-scale dataset of 13,389 resumes from diverse sources and
employed Large Language Models (LLMs) such as BERT and Gemma1.1 2B for
classification. Our results demonstrate significant improvements over
traditional machine learning approaches, with our best model achieving a top-1
accuracy of 92\% and a top-5 accuracy of 97.5\%. These findings underscore the
importance of dataset quality and advanced model architectures in enhancing the
accuracy and robustness of resume classification systems, thus advancing the
field of online recruitment practices.Summary
AI-Generated Summary