Hugging Face 지식 그래프 기반 추천, 분류 및 추적 벤치마킹

초록

오픈소스 머신러닝(ML) 리소스, 특히 모델과 데이터셋의 급속한 성장은 정보 검색(IR) 연구를 가속화시켰습니다. 그러나 Hugging Face와 같은 기존 플랫폼은 구조화된 표현을 명시적으로 활용하지 않아, 모델 진화 추적 및 관련 데이터셋 추천과 같은 고급 쿼리 및 분석이 제한됩니다. 이러한 격차를 메우기 위해, 우리는 ML 리소스 관리를 위해 Hugging Face 커뮤니티에서 구축된 최초의 대규모 지식 그래프인 HuggingKG를 개발했습니다. 260만 개의 노드와 620만 개의 엣지로 구성된 HuggingKG는 도메인 특화 관계와 풍부한 텍스트 속성을 포착합니다. 이를 통해 우리는 리소스 추천, 분류, 추적을 포함한 IR 작업을 위한 세 가지 새로운 테스트 컬렉션으로 구성된 다중 작업 벤치마크인 HuggingBench를 추가로 제시할 수 있습니다. 우리의 실험은 HuggingKG와 파생 작업의 독특한 특성을 보여줍니다. 두 리소스 모두 공개적으로 제공되며, 오픈소스 리소스 공유 및 관리 연구를 발전시킬 것으로 기대됩니다.

English

The rapid growth of open source machine learning (ML) resources, such as models and datasets, has accelerated IR research. However, existing platforms like Hugging Face do not explicitly utilize structured representations, limiting advanced queries and analyses such as tracing model evolution and recommending relevant datasets. To fill the gap, we construct HuggingKG, the first large-scale knowledge graph built from the Hugging Face community for ML resource management. With 2.6 million nodes and 6.2 million edges, HuggingKG captures domain-specific relations and rich textual attributes. It enables us to further present HuggingBench, a multi-task benchmark with three novel test collections for IR tasks including resource recommendation, classification, and tracing. Our experiments reveal unique characteristics of HuggingKG and the derived tasks. Both resources are publicly available, expected to advance research in open source resource sharing and management.

Hugging Face 지식 그래프 기반 추천, 분류 및 추적 벤치마킹

Benchmarking Recommendation, Classification, and Tracing Based on Hugging Face Knowledge Graph

초록

Support