简历图谱:利用大规模数据集和大型语言模型重新审视简历分类
ResumeAtlas: Revisiting Resume Classification with Large-Scale Datasets and Large Language Models
June 26, 2024
作者: Ahmed Heakl, Youssef Mohamed, Noran Mohamed, Ali Sharkaway, Ahmed Zaky
cs.AI
摘要
随着对在线招聘平台的日益依赖以及人工智能技术的采用,突显了高效简历分类方法的关键需求。然而,诸如数据集规模小、缺乏标准化简历模板和隐私问题等挑战阻碍了现有分类模型的准确性和有效性。在这项工作中,我们通过提出一种全面的简历分类方法来解决这些挑战。我们从多个来源精心筛选了一个规模为13,389份简历的大型数据集,并采用了诸如BERT和Gemma1.1 2B之类的大型语言模型(LLMs)进行分类。我们的结果显示,相较于传统机器学习方法,我们的最佳模型在准确性方面取得了显著改进,最高准确率达到92\%,前五准确率达到97.5\%。这些发现强调了数据集质量和先进模型架构在提升简历分类系统准确性和鲁棒性方面的重要性,从而推动了在线招聘实践领域的发展。
English
The increasing reliance on online recruitment platforms coupled with the
adoption of AI technologies has highlighted the critical need for efficient
resume classification methods. However, challenges such as small datasets, lack
of standardized resume templates, and privacy concerns hinder the accuracy and
effectiveness of existing classification models. In this work, we address these
challenges by presenting a comprehensive approach to resume classification. We
curated a large-scale dataset of 13,389 resumes from diverse sources and
employed Large Language Models (LLMs) such as BERT and Gemma1.1 2B for
classification. Our results demonstrate significant improvements over
traditional machine learning approaches, with our best model achieving a top-1
accuracy of 92\% and a top-5 accuracy of 97.5\%. These findings underscore the
importance of dataset quality and advanced model architectures in enhancing the
accuracy and robustness of resume classification systems, thus advancing the
field of online recruitment practices.Summary
AI-Generated Summary