基於Hugging Face知識圖譜的推薦、分類與追蹤基準測試
Benchmarking Recommendation, Classification, and Tracing Based on Hugging Face Knowledge Graph
May 23, 2025
作者: Qiaosheng Chen, Kaijia Huang, Xiao Zhou, Weiqing Luo, Yuanning Cui, Gong Cheng
cs.AI
摘要
開源機器學習(ML)資源(如模型和數據集)的快速增長,加速了信息檢索(IR)研究的進展。然而,現有平台如Hugging Face並未明確利用結構化表示,這限制了高級查詢和分析,例如追蹤模型演變和推薦相關數據集。為填補這一空白,我們構建了HuggingKG,這是首個基於Hugging Face社區的大規模知識圖譜,專用於ML資源管理。HuggingKG包含260萬個節點和620萬條邊,捕捉了領域特定的關係和豐富的文本屬性。基於此,我們進一步推出了HuggingBench,這是一個多任務基準,包含三個新穎的測試集,用於資源推薦、分類和追蹤等IR任務。我們的實驗揭示了HuggingKG及其衍生任務的獨特特性。這兩項資源均已公開,預計將推動開源資源共享與管理的研究進展。
English
The rapid growth of open source machine learning (ML) resources, such as
models and datasets, has accelerated IR research. However, existing platforms
like Hugging Face do not explicitly utilize structured representations,
limiting advanced queries and analyses such as tracing model evolution and
recommending relevant datasets. To fill the gap, we construct HuggingKG, the
first large-scale knowledge graph built from the Hugging Face community for ML
resource management. With 2.6 million nodes and 6.2 million edges, HuggingKG
captures domain-specific relations and rich textual attributes. It enables us
to further present HuggingBench, a multi-task benchmark with three novel test
collections for IR tasks including resource recommendation, classification, and
tracing. Our experiments reveal unique characteristics of HuggingKG and the
derived tasks. Both resources are publicly available, expected to advance
research in open source resource sharing and management.Summary
AI-Generated Summary