ResumeAtlas: 大規模データセットと言語モデルを用いたレジュメ分類の再考

要旨

オンライン採用プラットフォームへの依存度の高まりとAI技術の採用により、効率的な履歴書分類手法の重要性が浮き彫りになっています。しかし、小規模なデータセット、標準化された履歴書テンプレートの欠如、プライバシーに関する懸念といった課題が、既存の分類モデルの精度と効果を妨げています。本研究では、これらの課題に対処するため、包括的な履歴書分類アプローチを提案します。多様なソースから13,389件の履歴書を収集した大規模データセットを構築し、BERTやGemma1.1 2Bなどの大規模言語モデル（LLM）を分類に活用しました。その結果、従来の機械学習アプローチを大幅に上回る成果を示し、最良のモデルではトップ1精度92％、トップ5精度97.5％を達成しました。これらの知見は、データセットの品質と高度なモデルアーキテクチャが履歴書分類システムの精度と堅牢性を向上させる上で重要であることを強調し、オンライン採用実践の分野を前進させるものです。

English

The increasing reliance on online recruitment platforms coupled with the adoption of AI technologies has highlighted the critical need for efficient resume classification methods. However, challenges such as small datasets, lack of standardized resume templates, and privacy concerns hinder the accuracy and effectiveness of existing classification models. In this work, we address these challenges by presenting a comprehensive approach to resume classification. We curated a large-scale dataset of 13,389 resumes from diverse sources and employed Large Language Models (LLMs) such as BERT and Gemma1.1 2B for classification. Our results demonstrate significant improvements over traditional machine learning approaches, with our best model achieving a top-1 accuracy of 92\% and a top-5 accuracy of 97.5\%. These findings underscore the importance of dataset quality and advanced model architectures in enhancing the accuracy and robustness of resume classification systems, thus advancing the field of online recruitment practices.

ResumeAtlas: 大規模データセットと言語モデルを用いたレジュメ分類の再考

ResumeAtlas: Revisiting Resume Classification with Large-Scale Datasets and Large Language Models

要旨

Support