CRUST-Bench: C-to-safe-Rust 변환을 위한 포괄적 벤치마크

초록

C-to-Rust 변환(transpilation)은 레거시 C 코드를 현대화하고 안전성을 강화하며 현대 Rust 생태계와의 상호 운용성을 높이는 데 필수적입니다. 그러나 현재 C 코드를 안전한 Rust로 변환하는 시스템의 성능을 평가하기 위한 데이터셋이 존재하지 않습니다. 우리는 CRUST-Bench를 소개합니다. 이는 100개의 C 저장소(repository)로 구성된 데이터셋으로, 각 저장소는 수동으로 작성된 안전한 Rust 인터페이스와 변환의 정확성을 검증할 수 있는 테스트 케이스와 함께 제공됩니다. CRUST-Bench는 단일 함수가 아닌 전체 저장소를 고려함으로써, 여러 파일 간의 의존성을 가진 복잡한 프로젝트를 번역하는 데 따른 어려움을 포착합니다. 제공된 Rust 인터페이스는 관용적이고 메모리 안전한 Rust 패턴을 준수하도록 명시적인 사양을 제공하며, 동반되는 테스트 케이스는 기능적 정확성을 강제합니다. 우리는 이 작업에 대해 최신 대형 언어 모델(LLM)을 평가했으며, 안전하고 관용적인 Rust 생성이 다양한 최신 방법과 기술에 여전히 어려운 문제임을 발견했습니다. 또한, LLM이 C에서 안전한 Rust로 코드를 변환할 때 일반적으로 발생하는 오류에 대한 통찰을 제공합니다. 가장 성능이 좋은 모델인 OpenAI o1은 단일 시도(single-shot) 설정에서 단 15개의 작업만 해결할 수 있었습니다. CRUST-Bench에서의 개선은 복잡한 시나리오를 추론하고 레거시 코드베이스를 C에서 Rust와 같은 메모리 안전성을 보장하는 언어로 마이그레이션하는 데 도움이 되는 향상된 변환 시스템으로 이어질 것입니다. 데이터셋과 코드는 https://github.com/anirudhkhatry/CRUST-bench에서 확인할 수 있습니다.

English

C-to-Rust transpilation is essential for modernizing legacy C code while enhancing safety and interoperability with modern Rust ecosystems. However, no dataset currently exists for evaluating whether a system can transpile C into safe Rust that passes a set of test cases. We introduce CRUST-Bench, a dataset of 100 C repositories, each paired with manually-written interfaces in safe Rust as well as test cases that can be used to validate correctness of the transpilation. By considering entire repositories rather than isolated functions, CRUST-Bench captures the challenges of translating complex projects with dependencies across multiple files. The provided Rust interfaces provide explicit specifications that ensure adherence to idiomatic, memory-safe Rust patterns, while the accompanying test cases enforce functional correctness. We evaluate state-of-the-art large language models (LLMs) on this task and find that safe and idiomatic Rust generation is still a challenging problem for various state-of-the-art methods and techniques. We also provide insights into the errors LLMs usually make in transpiling code from C to safe Rust. The best performing model, OpenAI o1, is able to solve only 15 tasks in a single-shot setting. Improvements on CRUST-Bench would lead to improved transpilation systems that can reason about complex scenarios and help in migrating legacy codebases from C into languages like Rust that ensure memory safety. You can find the dataset and code at https://github.com/anirudhkhatry/CRUST-bench.

CRUST-Bench: C-to-safe-Rust 변환을 위한 포괄적 벤치마크

CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation

초록

Support