基於Hugging Face知識圖譜的推薦、分類與追蹤基準測試

摘要

開源機器學習（ML）資源（如模型和數據集）的快速增長，加速了信息檢索（IR）研究的進展。然而，現有平台如Hugging Face並未明確利用結構化表示，這限制了高級查詢和分析，例如追蹤模型演變和推薦相關數據集。為填補這一空白，我們構建了HuggingKG，這是首個基於Hugging Face社區的大規模知識圖譜，專用於ML資源管理。HuggingKG包含260萬個節點和620萬條邊，捕捉了領域特定的關係和豐富的文本屬性。基於此，我們進一步推出了HuggingBench，這是一個多任務基準，包含三個新穎的測試集，用於資源推薦、分類和追蹤等IR任務。我們的實驗揭示了HuggingKG及其衍生任務的獨特特性。這兩項資源均已公開，預計將推動開源資源共享與管理的研究進展。

English

The rapid growth of open source machine learning (ML) resources, such as models and datasets, has accelerated IR research. However, existing platforms like Hugging Face do not explicitly utilize structured representations, limiting advanced queries and analyses such as tracing model evolution and recommending relevant datasets. To fill the gap, we construct HuggingKG, the first large-scale knowledge graph built from the Hugging Face community for ML resource management. With 2.6 million nodes and 6.2 million edges, HuggingKG captures domain-specific relations and rich textual attributes. It enables us to further present HuggingBench, a multi-task benchmark with three novel test collections for IR tasks including resource recommendation, classification, and tracing. Our experiments reveal unique characteristics of HuggingKG and the derived tasks. Both resources are publicly available, expected to advance research in open source resource sharing and management.

基於Hugging Face知識圖譜的推薦、分類與追蹤基準測試

Benchmarking Recommendation, Classification, and Tracing Based on Hugging Face Knowledge Graph

摘要

Support