BioCoder：具有上下文实用知识的生物信息学代码生成基准。

摘要

像ChatGPT这样的预训练语言模型显著改进了代码生成。随着这些模型规模的扩大，对输出处理更复杂任务的需求也在增加。此外，在生物信息学中，生成功能性程序面临额外显著挑战，这是因为领域知识的数量、对复杂数据操作的需求以及操作之间错综复杂的功能依赖关系。在这里，我们介绍了BioCoder，这是一个用于评估现有预训练模型在生成生物信息学代码方面的基准。关于函数代码生成，BioCoder涵盖潜在的包依赖关系、类声明和全局变量。它包含来自GitHub的Python和Java中的1026个函数和1243个方法，以及来自Rosalind项目的253个示例。BioCoder还整合了一个用于评估的模糊测试框架，我们已将其应用于评估许多模型，包括InCoder、CodeGen、CodeGen2、SantaCoder、StarCoder、StarCoder+、InstructCodeT5+和ChatGPT。我们对这些模型的详细分析强调了领域知识、实用代码生成和语境理解的重要性。我们的数据集、基准、Docker镜像和用于测试的脚本都可在https://github.com/gersteinlab/biocoder 上找到。

English

Pre-trained language models like ChatGPT have significantly improved code generation. As these models scale up, there is an increasing need for the output to handle more intricate tasks. Moreover, in bioinformatics, generating functional programs poses additional notable challenges due to the amount of domain knowledge, the need for complicated data operations, and intricate functional dependencies between the operations. Here, we present BioCoder, a benchmark developed to evaluate existing pre-trained models in generating bioinformatics code. In relation to function-code generation, BioCoder covers potential package dependencies, class declarations, and global variables. It incorporates 1026 functions and 1243 methods in Python and Java from GitHub and 253 examples from the Rosalind Project. BioCoder incorporates a fuzz-testing framework for evaluation, and we have applied it to evaluate many models including InCoder, CodeGen, CodeGen2, SantaCoder, StarCoder, StarCoder+, InstructCodeT5+, and ChatGPT. Our detailed analysis of these models emphasizes the importance of domain knowledge, pragmatic code generation, and contextual understanding. Our dataset, benchmark, Docker images, and scripts required for testing are all available at https://github.com/gersteinlab/biocoder.

BioCoder：具有上下文实用知识的生物信息学代码生成基准。

BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge

摘要

Support