ChatPaper.aiChatPaper

CASS:從Nvidia到AMD的數據、模型與基準測試轉譯

CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark

May 22, 2025
作者: Ahmed Heakl, Sarim Hashmi, Gustavo Bertolo Stahl, Seung Hun Eddie Han, Salman Khan, Abdulrahman Mahmoud
cs.AI

摘要

我們介紹了CASS,這是首個針對跨架構GPU程式碼轉譯的大規模資料集與模型套件,涵蓋源碼層級(CUDA ↔ HIP)與彙編層級(Nvidia SASS ↔ AMD RDNA3)的轉譯。該資料集包含70,000對經過驗證的主機與設備程式碼對,填補了低階GPU程式碼可攜性領域的關鍵空白。利用這一資源,我們訓練了CASS系列領域專用語言模型,實現了95%的源碼轉譯準確率與37.5%的彙編轉譯準確率,顯著超越了如GPT-4o、Claude及Hipify等商業基準。我們生成的程式碼在超過85%的測試案例中與原生性能相匹配,保持了運行時與記憶體行為。為支持嚴謹的評估,我們引入了CASS-Bench,這是一個涵蓋16個GPU領域並包含真實執行的精選基準測試集。所有資料、模型及評估工具均以開源形式發布,旨在促進GPU編譯器工具、二進制兼容性及LLM引導的硬體轉譯領域的進步。資料集與基準測試集可在https://huggingface.co/datasets/MBZUAI/cass{blue{HuggingFace}}獲取,程式碼則位於https://github.com/GustavoStahl/CASS{blue{GitHub}}。
English
We introduce CASS, the first large-scale dataset and model suite for cross-architecture GPU code transpilation, targeting both source-level (CUDA leftrightarrow HIP) and assembly-level (Nvidia SASS leftrightarrow AMD RDNA3) translation. The dataset comprises 70k verified code pairs across host and device, addressing a critical gap in low-level GPU code portability. Leveraging this resource, we train the CASS family of domain-specific language models, achieving 95% source translation accuracy and 37.5% assembly translation accuracy, substantially outperforming commercial baselines such as GPT-4o, Claude, and Hipify. Our generated code matches native performance in over 85% of test cases, preserving runtime and memory behavior. To support rigorous evaluation, we introduce CASS-Bench, a curated benchmark spanning 16 GPU domains with ground-truth execution. All data, models, and evaluation tools are released as open source to foster progress in GPU compiler tooling, binary compatibility, and LLM-guided hardware translation. Dataset and benchmark are on https://huggingface.co/datasets/MBZUAI/cass{blue{HuggingFace}}, with code at https://github.com/GustavoStahl/CASS{blue{GitHub}}.

Summary

AI-Generated Summary

PDF12May 28, 2025