EVOC2RUST：一個骨架引導的專案級C到Rust轉譯框架

摘要

Rust 的編譯時安全保證使其成為安全關鍵系統的理想選擇，這也催生了將遺留 C 語言程式碼庫轉譯為 Rust 的需求。儘管針對此任務已出現多種方法，但它們都面臨著固有的權衡：基於規則的解決方案難以滿足程式碼安全性和慣用性要求，而基於大型語言模型（LLM）的解決方案則常因模組間的重度依賴而無法生成語意等價的 Rust 程式碼。近期研究顯示，這兩種解決方案均僅適用於小規模程式。本文提出 EvoC2Rust，這是一個將完整 C 專案轉換為等價 Rust 專案的自動化框架。EvoC2Rust 採用骨架引導的翻譯策略進行專案層級的轉譯。其流程包含三個演化階段：1）首先將 C 專案分解為功能模組，利用特徵映射增強型 LLM 轉換定義和巨集，並生成經過型別檢查的函數存根，從而形成可編譯的 Rust 骨架；2）隨後逐步翻譯函數，替換相應的存根佔位符；3）最後，通過整合 LLM 和靜態分析來修復編譯錯誤。透過演化增強，EvoC2Rust 結合了基於規則和基於 LLM 解決方案的優勢。我們在開源基準測試和六個工業專案上的評估顯示，EvoC2Rust 在專案層級的 C 到 Rust 轉譯中表現卓越。平均而言，它在語法和語意準確性上分別比基於 LLM 的方法提升了 17.24% 和 14.32%，同時程式碼安全率比基於規則的工具高出 96.79%。在模組層級上，EvoC2Rust 在工業專案中達到了 92.25% 的編譯通過率和 89.53% 的測試通過率，即使面對複雜的程式碼庫和冗長的函數也是如此。

English

Rust's compile-time safety guarantees make it ideal for safety-critical systems, creating demand for translating legacy C codebases to Rust. While various approaches have emerged for this task, they face inherent trade-offs: rule-based solutions face challenges in meeting code safety and idiomaticity requirements, while LLM-based solutions often fail to generate semantically equivalent Rust code, due to the heavy dependencies of modules across the entire codebase. Recent studies have revealed that both solutions are limited to small-scale programs. In this paper, we propose EvoC2Rust, an automated framework for converting entire C projects to equivalent Rust ones. EvoC2Rust employs a skeleton-guided translation strategy for project-level translation. The pipeline consists of three evolutionary stages: 1) it first decomposes the C project into functional modules, employs a feature-mapping-enhanced LLM to transform definitions and macros and generates type-checked function stubs, which form a compilable Rust skeleton; 2) it then incrementally translates the function, replacing the corresponding stub placeholder; 3) finally, it repairs compilation errors by integrating LLM and static analysis. Through evolutionary augmentation, EvoC2Rust combines the advantages of both rule-based and LLM-based solutions. Our evaluation on open-source benchmarks and six industrial projects demonstrates EvoC2Rust's superior performance in project-level C-to-Rust translation. On average, it achieves 17.24% and 14.32% improvements in syntax and semantic accuracy over the LLM-based approaches, along with a 96.79% higher code safety rate than the rule-based tools. At the module level, EvoC2Rust reaches 92.25% compilation and 89.53% test pass rates on industrial projects, even for complex codebases and long functions.

EVOC2RUST：一個骨架引導的專案級C到Rust轉譯框架

EVOC2RUST: A Skeleton-guided Framework for Project-Level C-to-Rust Translation

摘要

Support