EVOC2RUST:基于骨架引导的项目级C到Rust翻译框架
EVOC2RUST: A Skeleton-guided Framework for Project-Level C-to-Rust Translation
August 6, 2025
作者: Chaofan Wang, Tingrui Yu, Jie Wang, Dong Chen, Wenrui Zhang, Yuling Shi, Xiaodong Gu, Beijun Shen
cs.AI
摘要
Rust的编译时安全保证使其成为安全关键系统的理想选择,这推动了将遗留C代码库转换为Rust的需求。尽管针对此任务已涌现多种方法,但它们均面临固有的权衡:基于规则的解决方案在满足代码安全性和惯用性要求方面存在挑战,而基于LLM(大语言模型)的解决方案由于整个代码库中模块间的高度依赖,往往难以生成语义等价的Rust代码。近期研究表明,这两种解决方案均局限于小型程序。本文提出EvoC2Rust,一个自动化框架,用于将整个C项目转换为等效的Rust项目。EvoC2Rust采用骨架引导的翻译策略进行项目级转换。其流程包含三个进化阶段:1)首先将C项目分解为功能模块,利用特征映射增强的LLM转换定义和宏,并生成经过类型检查的函数存根,形成可编译的Rust骨架;2)随后逐步翻译函数,替换相应的存根占位符;3)最后,通过整合LLM和静态分析修复编译错误。通过进化增强,EvoC2Rust结合了基于规则和基于LLM解决方案的优势。我们在开源基准和六个工业项目上的评估表明,EvoC2Rust在项目级C到Rust翻译中表现出色。平均而言,它在语法和语义准确性上分别比基于LLM的方法提升了17.24%和14.32%,同时代码安全率比基于规则的工具高出96.79%。在模块级别,EvoC2Rust在工业项目上达到了92.25%的编译通过率和89.53%的测试通过率,即使面对复杂代码库和长函数也能保持高效。
English
Rust's compile-time safety guarantees make it ideal for safety-critical
systems, creating demand for translating legacy C codebases to Rust. While
various approaches have emerged for this task, they face inherent trade-offs:
rule-based solutions face challenges in meeting code safety and idiomaticity
requirements, while LLM-based solutions often fail to generate semantically
equivalent Rust code, due to the heavy dependencies of modules across the
entire codebase. Recent studies have revealed that both solutions are limited
to small-scale programs. In this paper, we propose EvoC2Rust, an automated
framework for converting entire C projects to equivalent Rust ones. EvoC2Rust
employs a skeleton-guided translation strategy for project-level translation.
The pipeline consists of three evolutionary stages: 1) it first decomposes the
C project into functional modules, employs a feature-mapping-enhanced LLM to
transform definitions and macros and generates type-checked function stubs,
which form a compilable Rust skeleton; 2) it then incrementally translates the
function, replacing the corresponding stub placeholder; 3) finally, it repairs
compilation errors by integrating LLM and static analysis. Through evolutionary
augmentation, EvoC2Rust combines the advantages of both rule-based and
LLM-based solutions. Our evaluation on open-source benchmarks and six
industrial projects demonstrates EvoC2Rust's superior performance in
project-level C-to-Rust translation. On average, it achieves 17.24% and 14.32%
improvements in syntax and semantic accuracy over the LLM-based approaches,
along with a 96.79% higher code safety rate than the rule-based tools. At the
module level, EvoC2Rust reaches 92.25% compilation and 89.53% test pass rates
on industrial projects, even for complex codebases and long functions.