Hugging Face의 모델 아틀라스 탐색 및 정리

초록

현재 수백만 개의 공개된 신경망 모델이 존재함에 따라, 대규모 모델 저장소를 탐색하고 분석하는 작업은 점점 더 중요해지고 있습니다. 이처럼 많은 모델을 탐색하기 위해서는 아틀라스가 필요하지만, 대부분의 모델이 제대로 문서화되지 않아 이러한 아틀라스를 작성하는 것은 어려운 과제입니다. 모델 저장소의 잠재력을 탐구하기 위해, 우리는 Hugging Face의 문서화된 부분을 나타내는 예비 아틀라스를 작성했습니다. 이 아틀라스는 모델 생태계와 그 진화를 놀라울 정도로 시각적으로 보여줍니다. 우리는 이 아틀라스의 여러 응용 사례를 보여주는데, 이는 모델 속성(예: 정확도) 예측과 컴퓨터 비전 모델의 트렌드 분석을 포함합니다. 그러나 현재 아틀라스는 여전히 불완전하므로, 문서화되지 않은 영역을 작성하는 방법을 제안합니다. 구체적으로, 우리는 실제 모델 학습 관행에서 주로 사용되는 고신뢰도 구조적 사전 정보를 식별합니다. 이러한 사전 정보를 활용함으로써, 우리의 접근 방식은 이전에 문서화되지 않은 아틀라스 영역을 정확하게 매핑할 수 있게 합니다. 우리는 데이터셋, 코드, 그리고 인터랙티브 아틀라스를 공개적으로 제공합니다.

English

As there are now millions of publicly available neural networks, searching and analyzing large model repositories becomes increasingly important. Navigating so many models requires an atlas, but as most models are poorly documented charting such an atlas is challenging. To explore the hidden potential of model repositories, we chart a preliminary atlas representing the documented fraction of Hugging Face. It provides stunning visualizations of the model landscape and evolution. We demonstrate several applications of this atlas including predicting model attributes (e.g., accuracy), and analyzing trends in computer vision models. However, as the current atlas remains incomplete, we propose a method for charting undocumented regions. Specifically, we identify high-confidence structural priors based on dominant real-world model training practices. Leveraging these priors, our approach enables accurate mapping of previously undocumented areas of the atlas. We publicly release our datasets, code, and interactive atlas.

Hugging Face의 모델 아틀라스 탐색 및 정리

Charting and Navigating Hugging Face's Model Atlas

초록

Support