GraphNet: 텐서 컴파일러 연구를 위한 대규모 계산 그래프 데이터셋

초록

우리는 6가지 주요 작업 범주에 걸쳐 여러 딥러닝 프레임워크의 실제 연산 그래프 2,700개와 풍부한 메타데이터로 구성된 GraphNet 데이터셋을 소개한다. 이러한 샘플에 대한 텐서 컴파일러 성능 평가를 위해 Speedup Score S(t) 벤치마크 지표를 제안하며, 이는 조정 가능한 허용 오차 수준에서의 런타임 속도 향상과 실행 정확도를 종합적으로 고려하여 일반적인 최적화 능력을 신뢰성 있게 측정한다. 더 나아가 S(t)를 오류 정보를 통합한 Error-aware Speedup Score ES(t)로 확장하여 컴파일러 개발자가 핵심 성능 병목 현상을 식별할 수 있도록 지원한다. 본 보고서에서는 GraphNet의 실용성을 입증하기 위해 컴퓨터 비전(CV) 및 자연어 처리(NLP) 샘플에 대해 PaddlePaddle의 CINN과 PyTorch의 TorchInductor 기본 텐서 컴파일러를 벤치마킹하였다. 그래프 추출 및 컴파일러 평가 도구를 포함한 전체 구축 파이프라인은 https://github.com/PaddlePaddle/GraphNet에서 확인할 수 있다.

English

We introduce GraphNet, a dataset of 2.7K real-world deep learning computational graphs with rich metadata, spanning six major task categories across multiple deep learning frameworks. To evaluate tensor compiler performance on these samples, we propose the benchmark metric Speedup Score S(t), which jointly considers runtime speedup and execution correctness under tunable tolerance levels, offering a reliable measure of general optimization capability. Furthermore, we extend S(t) to the Error-aware Speedup Score ES(t), which incorporates error information and helps compiler developers identify key performance bottlenecks. In this report, we benchmark the default tensor compilers, CINN for PaddlePaddle and TorchInductor for PyTorch, on computer vision (CV) and natural language processing (NLP) samples to demonstrate the practicality of GraphNet. The full construction pipeline with graph extraction and compiler evaluation tools is available at https://github.com/PaddlePaddle/GraphNet .

GraphNet: 텐서 컴파일러 연구를 위한 대규모 계산 그래프 데이터셋

GraphNet: A Large-Scale Computational Graph Dataset for Tensor Compiler Research

초록

Support