EVOC2RUST: プロジェクトレベルのC言語からRustへの翻訳のためのスケルトンガイドフレームワーク

要旨

Rustのコンパイル時の安全性保証は、安全クリティカルなシステムに最適であり、レガシーなCコードベースをRustに翻訳する需要を生み出しています。このタスクに対してさまざまなアプローチが登場していますが、それらには固有のトレードオフが存在します。ルールベースのソリューションは、コードの安全性と慣用的な要件を満たすことに課題を抱えており、LLMベースのソリューションは、コードベース全体にわたるモジュール間の依存関係が重いため、意味的に等価なRustコードを生成することにしばしば失敗します。最近の研究では、どちらのソリューションも小規模なプログラムに限定されていることが明らかになっています。本論文では、Cプロジェクト全体を等価なRustプロジェクトに変換するための自動化フレームワークであるEvoC2Rustを提案します。EvoC2Rustは、プロジェクトレベルの翻訳のためにスケルトンガイド翻訳戦略を採用しています。パイプラインは3つの進化的段階で構成されています：1）まず、Cプロジェクトを機能モジュールに分解し、特徴マッピングを強化したLLMを使用して定義とマクロを変換し、型チェックされた関数スタブを生成します。これにより、コンパイル可能なRustスケルトンが形成されます。2）次に、関数を段階的に翻訳し、対応するスタブプレースホルダーを置き換えます。3）最後に、LLMと静的解析を統合してコンパイルエラーを修復します。進化的拡張を通じて、EvoC2RustはルールベースとLLMベースの両方のソリューションの利点を組み合わせています。オープンソースのベンチマークと6つの産業プロジェクトでの評価により、EvoC2RustがプロジェクトレベルのCからRustへの翻訳において優れた性能を発揮することが示されました。平均して、LLMベースのアプローチと比較して、構文と意味の正確性がそれぞれ17.24％と14.32％向上し、ルールベースのツールと比較してコードの安全性が96.79％高くなりました。モジュールレベルでは、EvoC2Rustは産業プロジェクトにおいて、複雑なコードベースや長い関数であっても、92.25％のコンパイル率と89.53％のテスト合格率を達成しました。

English

Rust's compile-time safety guarantees make it ideal for safety-critical systems, creating demand for translating legacy C codebases to Rust. While various approaches have emerged for this task, they face inherent trade-offs: rule-based solutions face challenges in meeting code safety and idiomaticity requirements, while LLM-based solutions often fail to generate semantically equivalent Rust code, due to the heavy dependencies of modules across the entire codebase. Recent studies have revealed that both solutions are limited to small-scale programs. In this paper, we propose EvoC2Rust, an automated framework for converting entire C projects to equivalent Rust ones. EvoC2Rust employs a skeleton-guided translation strategy for project-level translation. The pipeline consists of three evolutionary stages: 1) it first decomposes the C project into functional modules, employs a feature-mapping-enhanced LLM to transform definitions and macros and generates type-checked function stubs, which form a compilable Rust skeleton; 2) it then incrementally translates the function, replacing the corresponding stub placeholder; 3) finally, it repairs compilation errors by integrating LLM and static analysis. Through evolutionary augmentation, EvoC2Rust combines the advantages of both rule-based and LLM-based solutions. Our evaluation on open-source benchmarks and six industrial projects demonstrates EvoC2Rust's superior performance in project-level C-to-Rust translation. On average, it achieves 17.24% and 14.32% improvements in syntax and semantic accuracy over the LLM-based approaches, along with a 96.79% higher code safety rate than the rule-based tools. At the module level, EvoC2Rust reaches 92.25% compilation and 89.53% test pass rates on industrial projects, even for complex codebases and long functions.

EVOC2RUST: プロジェクトレベルのC言語からRustへの翻訳のためのスケルトンガイドフレームワーク

EVOC2RUST: A Skeleton-guided Framework for Project-Level C-to-Rust Translation

要旨

Support