确保推测:一种基于语言建模的CISC至RISC转译方法及其测试保证
Guaranteed Guess: A Language Modeling Approach for CISC-to-RISC Transpilation with Testing Guarantees
June 17, 2025
作者: Ahmed Heakl, Sarim Hashmi, Chaimaa Abi, Celine Lee, Abdulrahman Mahmoud
cs.AI
摘要
硬件生态系统正迅速演进,跨不同指令集架构(ISAs)快速、灵活且准确地转换低级程序,以提升现有代码的可移植性与长期可用性,这一需求日益受到关注。在复杂指令集(CISC)与精简指令集(RISC)硬件架构之间进行转换,由于指令复杂度、内存模型及执行范式的根本差异,成为此类转换问题中尤为棘手的一类。本研究提出GG(Guaranteed Guess),一种以ISA为核心的转换管道,它结合了预训练大型语言模型(LLMs)的翻译能力与成熟软件测试结构的严谨性。我们的方法利用LLM生成从一个ISA到另一个ISA的候选翻译,并将这些翻译嵌入软件测试框架中,以建立对翻译结果的量化信心。我们在两个多样化数据集上评估了GG方法,确保单元测试代码覆盖率高于98%,并在HumanEval程序上实现了99%的功能/语义正确性,在BringupBench程序上达到49%。此外,我们将GG与苹果硅上的Rosetta 2框架进行了对比,结果显示,GG转换后的代码在运行速度上快1.73倍,能效提升1.47倍,内存使用效率提高2.41倍,充分证明了GG在实际CISC到RISC转换任务中的有效性。我们将开源代码、数据、模型及基准测试,为ISA级代码翻译研究奠定共同基础。
English
The hardware ecosystem is rapidly evolving, with increasing interest in
translating low-level programs across different instruction set architectures
(ISAs) in a quick, flexible, and correct way to enhance the portability and
longevity of existing code. A particularly challenging class of this
transpilation problem is translating between complex- (CISC) and reduced-
(RISC) hardware architectures, due to fundamental differences in instruction
complexity, memory models, and execution paradigms. In this work, we introduce
GG (Guaranteed Guess), an ISA-centric transpilation pipeline that combines the
translation power of pre-trained large language models (LLMs) with the rigor of
established software testing constructs. Our method generates candidate
translations using an LLM from one ISA to another, and embeds such translations
within a software-testing framework to build quantifiable confidence in the
translation. We evaluate our GG approach over two diverse datasets, enforce
high code coverage (>98%) across unit tests, and achieve functional/semantic
correctness of 99% on HumanEval programs and 49% on BringupBench programs,
respectively. Further, we compare our approach to the state-of-the-art Rosetta
2 framework on Apple Silicon, showcasing 1.73x faster runtime performance,
1.47x better energy efficiency, and 2.41x better memory usage for our
transpiled code, demonstrating the effectiveness of GG for real-world
CISC-to-RISC translation tasks. We will open-source our codes, data, models,
and benchmarks to establish a common foundation for ISA-level code translation
research.