保證性猜測:一種基於語言模型且具測試保證的CISC到RISC轉譯方法
Guaranteed Guess: A Language Modeling Approach for CISC-to-RISC Transpilation with Testing Guarantees
June 17, 2025
作者: Ahmed Heakl, Sarim Hashmi, Chaimaa Abi, Celine Lee, Abdulrahman Mahmoud
cs.AI
摘要
硬體生態系統正快速演進,人們對於以快速、靈活且正確的方式在不同指令集架構(ISA)之間轉譯低階程式,以提升現有程式碼的可攜性與長期可用性,展現出日益濃厚的興趣。這類轉譯問題中,特別具有挑戰性的是在複雜指令集(CISC)與精簡指令集(RISC)硬體架構之間進行轉譯,這源於指令複雜度、記憶體模型及執行範式上的根本差異。在本研究中,我們介紹了GG(Guaranteed Guess),這是一個以ISA為中心的轉譯管線,它結合了預訓練大型語言模型(LLMs)的翻譯能力與成熟軟體測試結構的嚴謹性。我們的方法利用LLM從一個ISA生成到另一個ISA的候選翻譯,並將這些翻譯嵌入軟體測試框架中,以建立對翻譯結果的可量化信心。我們在兩個多樣化的資料集上評估了GG方法,確保單元測試的高程式碼覆蓋率(>98%),並在HumanEval程式上實現了99%的功能/語意正確性,在BringupBench程式上則達到了49%。此外,我們將我們的方法與Apple Silicon上的最新Rosetta 2框架進行比較,展示了我們轉譯後的程式碼在運行時效能上快1.73倍,能源效率提升1.47倍,記憶體使用效率提高2.41倍,證明了GG在實際CISC到RISC轉譯任務中的有效性。我們將開源我們的程式碼、資料、模型與基準測試,為ISA層級的程式碼轉譯研究建立共同基礎。
English
The hardware ecosystem is rapidly evolving, with increasing interest in
translating low-level programs across different instruction set architectures
(ISAs) in a quick, flexible, and correct way to enhance the portability and
longevity of existing code. A particularly challenging class of this
transpilation problem is translating between complex- (CISC) and reduced-
(RISC) hardware architectures, due to fundamental differences in instruction
complexity, memory models, and execution paradigms. In this work, we introduce
GG (Guaranteed Guess), an ISA-centric transpilation pipeline that combines the
translation power of pre-trained large language models (LLMs) with the rigor of
established software testing constructs. Our method generates candidate
translations using an LLM from one ISA to another, and embeds such translations
within a software-testing framework to build quantifiable confidence in the
translation. We evaluate our GG approach over two diverse datasets, enforce
high code coverage (>98%) across unit tests, and achieve functional/semantic
correctness of 99% on HumanEval programs and 49% on BringupBench programs,
respectively. Further, we compare our approach to the state-of-the-art Rosetta
2 framework on Apple Silicon, showcasing 1.73x faster runtime performance,
1.47x better energy efficiency, and 2.41x better memory usage for our
transpiled code, demonstrating the effectiveness of GG for real-world
CISC-to-RISC translation tasks. We will open-source our codes, data, models,
and benchmarks to establish a common foundation for ISA-level code translation
research.