CASS: 데이터, 모델, 벤치마크를 활용한 Nvidia에서 AMD로의 트랜스파일레이션

초록

우리는 크로스 아키텍처 GPU 코드 변환을 위한 첫 번째 대규모 데이터셋 및 모델 제품군인 CASS를 소개합니다. 이는 소스 수준(CUDA ↔ HIP)과 어셈블리 수준(Nvidia SASS ↔ AMD RDNA3)의 번역을 모두 대상으로 합니다. 이 데이터셋은 호스트와 디바이스 간의 70,000개의 검증된 코드 쌍으로 구성되어 있으며, 저수준 GPU 코드 이식성의 중요한 격차를 해소합니다. 이 리소스를 활용하여 도메인 특화 언어 모델인 CASS 제품군을 학습시켜, 95%의 소스 번역 정확도와 37.5%의 어셈블리 번역 정확도를 달성했습니다. 이는 GPT-4o, Claude, Hipify와 같은 상용 베이스라인을 크게 능가하는 성과입니다. 우리가 생성한 코드는 85% 이상의 테스트 케이스에서 네이티브 성능을 유지하며, 런타임 및 메모리 동작을 보존합니다. 엄격한 평가를 지원하기 위해, 우리는 16개의 GPU 도메인을 아우르며 실제 실행을 포함한 CASS-Bench를 도입했습니다. 모든 데이터, 모델 및 평가 도구는 GPU 컴파일러 도구, 바이너리 호환성, 그리고 LLM 기반 하드웨어 번역의 발전을 촉진하기 위해 오픈 소스로 공개되었습니다. 데이터셋과 벤치마크는 https://huggingface.co/datasets/MBZUAI/cass{blue{HuggingFace}}에서 확인할 수 있으며, 코드는 https://github.com/GustavoStahl/CASS{blue{GitHub}}에서 제공됩니다.

English

We introduce CASS, the first large-scale dataset and model suite for cross-architecture GPU code transpilation, targeting both source-level (CUDA leftrightarrow HIP) and assembly-level (Nvidia SASS leftrightarrow AMD RDNA3) translation. The dataset comprises 70k verified code pairs across host and device, addressing a critical gap in low-level GPU code portability. Leveraging this resource, we train the CASS family of domain-specific language models, achieving 95% source translation accuracy and 37.5% assembly translation accuracy, substantially outperforming commercial baselines such as GPT-4o, Claude, and Hipify. Our generated code matches native performance in over 85% of test cases, preserving runtime and memory behavior. To support rigorous evaluation, we introduce CASS-Bench, a curated benchmark spanning 16 GPU domains with ground-truth execution. All data, models, and evaluation tools are released as open source to foster progress in GPU compiler tooling, binary compatibility, and LLM-guided hardware translation. Dataset and benchmark are on https://huggingface.co/datasets/MBZUAI/cass{blue{HuggingFace}}, with code at https://github.com/GustavoStahl/CASS{blue{GitHub}}.

CASS: 데이터, 모델, 벤치마크를 활용한 Nvidia에서 AMD로의 트랜스파일레이션

CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark

초록

Support