CASS:從Nvidia到AMD的數據、模型與基準測試轉譯
CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark
May 22, 2025
作者: Ahmed Heakl, Sarim Hashmi, Gustavo Bertolo Stahl, Seung Hun Eddie Han, Salman Khan, Abdulrahman Mahmoud
cs.AI
摘要
我們介紹了CASS,這是首個針對跨架構GPU程式碼轉譯的大規模資料集與模型套件,涵蓋源碼層級(CUDA ↔ HIP)與彙編層級(Nvidia SASS ↔ AMD RDNA3)的轉譯。該資料集包含70,000對經過驗證的主機與設備程式碼對,填補了低階GPU程式碼可攜性領域的關鍵空白。利用這一資源,我們訓練了CASS系列領域專用語言模型,實現了95%的源碼轉譯準確率與37.5%的彙編轉譯準確率,顯著超越了如GPT-4o、Claude及Hipify等商業基準。我們生成的程式碼在超過85%的測試案例中與原生性能相匹配,保持了運行時與記憶體行為。為支持嚴謹的評估,我們引入了CASS-Bench,這是一個涵蓋16個GPU領域並包含真實執行的精選基準測試集。所有資料、模型及評估工具均以開源形式發布,旨在促進GPU編譯器工具、二進制兼容性及LLM引導的硬體轉譯領域的進步。資料集與基準測試集可在https://huggingface.co/datasets/MBZUAI/cass{blue{HuggingFace}}獲取,程式碼則位於https://github.com/GustavoStahl/CASS{blue{GitHub}}。
English
We introduce CASS, the first large-scale dataset and model suite for
cross-architecture GPU code transpilation, targeting both source-level (CUDA
leftrightarrow HIP) and assembly-level (Nvidia SASS leftrightarrow AMD
RDNA3) translation. The dataset comprises 70k verified code pairs across host
and device, addressing a critical gap in low-level GPU code portability.
Leveraging this resource, we train the CASS family of domain-specific language
models, achieving 95% source translation accuracy and 37.5% assembly
translation accuracy, substantially outperforming commercial baselines such as
GPT-4o, Claude, and Hipify. Our generated code matches native performance in
over 85% of test cases, preserving runtime and memory behavior. To support
rigorous evaluation, we introduce CASS-Bench, a curated benchmark spanning 16
GPU domains with ground-truth execution. All data, models, and evaluation tools
are released as open source to foster progress in GPU compiler tooling, binary
compatibility, and LLM-guided hardware translation. Dataset and benchmark are
on
https://huggingface.co/datasets/MBZUAI/cass{blue{HuggingFace}},
with code at
https://github.com/GustavoStahl/CASS{blue{GitHub}}.Summary
AI-Generated Summary