MIGRATION-BENCH：Java 8代码库级迁移基准测试

摘要

随着近年来强大大型语言模型（LLMs）的快速发展，众多软件工程任务现可通过LLMs得到解决，极大地提升了生产力和可扩展性。为评估这些模型的编码能力，已开发出大量基准数据集，但这些数据集主要聚焦于问题解决和故障排除任务。相比之下，我们引入了一个新的编码基准MIGRATION-BENCH，其独特关注点在于代码迁移。MIGRATION-BENCH旨在作为从Java 8迁移至最新长期支持（LTS）版本（Java 17、21）的全面基准，包含完整数据集及其精选子集，分别涵盖5,102和300个代码库。精选子集基于复杂性和难度精心挑选，为代码迁移领域的研究提供了多样化的资源支持。此外，我们提供了一套全面的评估框架，以促进对这一挑战性任务进行严格且标准化的LLMs评估。我们进一步提出了SD-Feedback，并证明LLMs能有效应对仓库级别的代码迁移至Java 17。对于使用Claude-3.5-Sonnet-v2的精选子集，SD-Feedback在最小和最大迁移上的成功率（pass@1）分别达到62.33%和27.00%。基准数据集及源代码可分别访问： https://huggingface.co/collections/AmazonScience 和 https://github.com/amazon-science/self_debug。

English

With the rapid advancement of powerful large language models (LLMs) in recent years, a wide range of software engineering tasks can now be addressed using LLMs, significantly enhancing productivity and scalability. Numerous benchmark datasets have been developed to evaluate the coding capabilities of these models, while they primarily focus on problem-solving and issue-resolution tasks. In contrast, we introduce a new coding benchmark MIGRATION-BENCH with a distinct focus: code migration. MIGRATION-BENCH aims to serve as a comprehensive benchmark for migration from Java 8 to the latest long-term support (LTS) versions (Java 17, 21), MIGRATION-BENCH includes a full dataset and its subset selected with 5,102 and 300 repositories respectively. Selected is a representative subset curated for complexity and difficulty, offering a versatile resource to support research in the field of code migration. Additionally, we provide a comprehensive evaluation framework to facilitate rigorous and standardized assessment of LLMs on this challenging task. We further propose SD-Feedback and demonstrate that LLMs can effectively tackle repository-level code migration to Java 17. For the selected subset with Claude-3.5-Sonnet-v2, SD-Feedback achieves 62.33% and 27.00% success rate (pass@1) for minimal and maximal migration respectively. The benchmark dataset and source code are available at: https://huggingface.co/collections/AmazonScience and https://github.com/amazon-science/self_debug respectively.

MIGRATION-BENCH：Java 8代码库级迁移基准测试

MIGRATION-BENCH: Repository-Level Code Migration Benchmark from Java 8

摘要

Support