CASS：從Nvidia到AMD的數據、模型與基準測試轉譯

摘要

我們介紹了CASS，這是首個針對跨架構GPU程式碼轉譯的大規模資料集與模型套件，涵蓋源碼層級（CUDA ↔ HIP）與彙編層級（Nvidia SASS ↔ AMD RDNA3）的轉譯。該資料集包含70,000對經過驗證的主機與設備程式碼對，填補了低階GPU程式碼可攜性領域的關鍵空白。利用這一資源，我們訓練了CASS系列領域專用語言模型，實現了95%的源碼轉譯準確率與37.5%的彙編轉譯準確率，顯著超越了如GPT-4o、Claude及Hipify等商業基準。我們生成的程式碼在超過85%的測試案例中與原生性能相匹配，保持了運行時與記憶體行為。為支持嚴謹的評估，我們引入了CASS-Bench，這是一個涵蓋16個GPU領域並包含真實執行的精選基準測試集。所有資料、模型及評估工具均以開源形式發布，旨在促進GPU編譯器工具、二進制兼容性及LLM引導的硬體轉譯領域的進步。資料集與基準測試集可在https://huggingface.co/datasets/MBZUAI/cass{blue{HuggingFace}}獲取，程式碼則位於https://github.com/GustavoStahl/CASS{blue{GitHub}}。

English

We introduce CASS, the first large-scale dataset and model suite for cross-architecture GPU code transpilation, targeting both source-level (CUDA leftrightarrow HIP) and assembly-level (Nvidia SASS leftrightarrow AMD RDNA3) translation. The dataset comprises 70k verified code pairs across host and device, addressing a critical gap in low-level GPU code portability. Leveraging this resource, we train the CASS family of domain-specific language models, achieving 95% source translation accuracy and 37.5% assembly translation accuracy, substantially outperforming commercial baselines such as GPT-4o, Claude, and Hipify. Our generated code matches native performance in over 85% of test cases, preserving runtime and memory behavior. To support rigorous evaluation, we introduce CASS-Bench, a curated benchmark spanning 16 GPU domains with ground-truth execution. All data, models, and evaluation tools are released as open source to foster progress in GPU compiler tooling, binary compatibility, and LLM-guided hardware translation. Dataset and benchmark are on https://huggingface.co/datasets/MBZUAI/cass{blue{HuggingFace}}, with code at https://github.com/GustavoStahl/CASS{blue{GitHub}}.

CASS：從Nvidia到AMD的數據、模型與基準測試轉譯

CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark

摘要

Support