MIGRATION-BENCH: 자바 8에서의 리포지토리 수준 코드 마이그레이션 벤치마크

초록

최근 강력한 대규모 언어 모델(LLM)의 급속한 발전으로 인해 다양한 소프트웨어 엔지니어링 작업을 LLM을 활용하여 처리할 수 있게 되었으며, 이는 생산성과 확장성을 크게 향상시켰습니다. 이러한 모델의 코딩 능력을 평가하기 위해 수많은 벤치마크 데이터셋이 개발되었지만, 이들은 주로 문제 해결 및 이슈 해결 작업에 초점을 맞추고 있습니다. 이와 대조적으로, 우리는 코드 마이그레이션에 초점을 맞춘 새로운 코딩 벤치마크인 MIGRATION-BENCH를 소개합니다. MIGRATION-BENCH는 Java 8에서 최신 장기 지원(LTS) 버전(Java 17, 21)으로의 마이그레이션을 위한 포괄적인 벤치마크로 설계되었으며, 각각 5,102개와 300개의 저장소로 구성된 전체 데이터셋과 선별된 하위 집합을 포함합니다. 선별된 하위 집합은 복잡성과 난이도를 고려하여 대표성을 갖추도록 구성되었으며, 코드 마이그레이션 분야의 연구를 지원하기 위한 다목적 리소스를 제공합니다. 또한, 우리는 이 도전적인 작업에 대해 LLM을 엄격하고 표준화된 방식으로 평가할 수 있는 포괄적인 평가 프레임워크를 제공합니다. 우리는 더 나아가 SD-Feedback를 제안하고, LLM이 Java 17로의 저장소 수준 코드 마이그레이션을 효과적으로 처리할 수 있음을 입증합니다. Claude-3.5-Sonnet-v2를 사용한 선별된 하위 집합에 대해, SD-Feedback은 최소 및 최대 마이그레이션에서 각각 62.33%와 27.00%의 성공률(pass@1)을 달성했습니다. 벤치마크 데이터셋과 소스 코드는 각각 https://huggingface.co/collections/AmazonScience와 https://github.com/amazon-science/self_debug에서 확인할 수 있습니다.

English

With the rapid advancement of powerful large language models (LLMs) in recent years, a wide range of software engineering tasks can now be addressed using LLMs, significantly enhancing productivity and scalability. Numerous benchmark datasets have been developed to evaluate the coding capabilities of these models, while they primarily focus on problem-solving and issue-resolution tasks. In contrast, we introduce a new coding benchmark MIGRATION-BENCH with a distinct focus: code migration. MIGRATION-BENCH aims to serve as a comprehensive benchmark for migration from Java 8 to the latest long-term support (LTS) versions (Java 17, 21), MIGRATION-BENCH includes a full dataset and its subset selected with 5,102 and 300 repositories respectively. Selected is a representative subset curated for complexity and difficulty, offering a versatile resource to support research in the field of code migration. Additionally, we provide a comprehensive evaluation framework to facilitate rigorous and standardized assessment of LLMs on this challenging task. We further propose SD-Feedback and demonstrate that LLMs can effectively tackle repository-level code migration to Java 17. For the selected subset with Claude-3.5-Sonnet-v2, SD-Feedback achieves 62.33% and 27.00% success rate (pass@1) for minimal and maximal migration respectively. The benchmark dataset and source code are available at: https://huggingface.co/collections/AmazonScience and https://github.com/amazon-science/self_debug respectively.

MIGRATION-BENCH: 자바 8에서의 리포지토리 수준 코드 마이그레이션 벤치마크

MIGRATION-BENCH: Repository-Level Code Migration Benchmark from Java 8

초록

Support