GraphNet: テンソルコンパイラ研究のための大規模計算グラフデータセット

要旨

本論文では、6つの主要タスクカテゴリーにまたがり、複数の深学習フレームワークに対応した豊富なメタデータを持つ2.7Kの実世界深学習計算グラフデータセット「GraphNet」を紹介する。これらのサンプルに対するテンソルコンパイラの性能評価のために、実行時間の高速化と調整可能な許容範囲内での正しさを統合的に考慮するベンチマーク指標「Speedup Score S(t)」を提案する。これは一般的な最適化能力の信頼性高い測定を提供する。さらに、S(t)を誤差情報を組み込んだ「Error-aware Speedup Score ES(t)」に拡張し、コンパイラ開発者が主要な性能ボトルネックを特定することを支援する。本報告では、コンピュータビジョン（CV）および自然言語処理（NLP）サンプルにおいて、PaddlePaddle向けデフォルトテンソルコンパイラであるCINNと、PyTorch向けTorchInductorをベンチマークし、GraphNetの実用性を実証する。グラフ抽出およびコンパイラ評価ツールを含む完全な構築パイプラインはhttps://github.com/PaddlePaddle/GraphNet で公開されている。

English

We introduce GraphNet, a dataset of 2.7K real-world deep learning computational graphs with rich metadata, spanning six major task categories across multiple deep learning frameworks. To evaluate tensor compiler performance on these samples, we propose the benchmark metric Speedup Score S(t), which jointly considers runtime speedup and execution correctness under tunable tolerance levels, offering a reliable measure of general optimization capability. Furthermore, we extend S(t) to the Error-aware Speedup Score ES(t), which incorporates error information and helps compiler developers identify key performance bottlenecks. In this report, we benchmark the default tensor compilers, CINN for PaddlePaddle and TorchInductor for PyTorch, on computer vision (CV) and natural language processing (NLP) samples to demonstrate the practicality of GraphNet. The full construction pipeline with graph extraction and compiler evaluation tools is available at https://github.com/PaddlePaddle/GraphNet .

GraphNet: テンソルコンパイラ研究のための大規模計算グラフデータセット

GraphNet: A Large-Scale Computational Graph Dataset for Tensor Compiler Research

要旨

Support