BioCoder:具有上下文实用知识的生物信息学代码生成基准。
BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge
August 31, 2023
作者: Xiangru Tang, Bill Qian, Rick Gao, Jiakang Chen, Xinyun Chen, Mark Gerstein
cs.AI
摘要
像ChatGPT这样的预训练语言模型显著改进了代码生成。随着这些模型规模的扩大,对输出处理更复杂任务的需求也在增加。此外,在生物信息学中,生成功能性程序面临额外显著挑战,这是因为领域知识的数量、对复杂数据操作的需求以及操作之间错综复杂的功能依赖关系。在这里,我们介绍了BioCoder,这是一个用于评估现有预训练模型在生成生物信息学代码方面的基准。关于函数代码生成,BioCoder涵盖潜在的包依赖关系、类声明和全局变量。它包含来自GitHub的Python和Java中的1026个函数和1243个方法,以及来自Rosalind项目的253个示例。BioCoder还整合了一个用于评估的模糊测试框架,我们已将其应用于评估许多模型,包括InCoder、CodeGen、CodeGen2、SantaCoder、StarCoder、StarCoder+、InstructCodeT5+和ChatGPT。我们对这些模型的详细分析强调了领域知识、实用代码生成和语境理解的重要性。我们的数据集、基准、Docker镜像和用于测试的脚本都可在https://github.com/gersteinlab/biocoder 上找到。
English
Pre-trained language models like ChatGPT have significantly improved code
generation. As these models scale up, there is an increasing need for the
output to handle more intricate tasks. Moreover, in bioinformatics, generating
functional programs poses additional notable challenges due to the amount of
domain knowledge, the need for complicated data operations, and intricate
functional dependencies between the operations. Here, we present BioCoder, a
benchmark developed to evaluate existing pre-trained models in generating
bioinformatics code. In relation to function-code generation, BioCoder covers
potential package dependencies, class declarations, and global variables. It
incorporates 1026 functions and 1243 methods in Python and Java from GitHub and
253 examples from the Rosalind Project. BioCoder incorporates a fuzz-testing
framework for evaluation, and we have applied it to evaluate many models
including InCoder, CodeGen, CodeGen2, SantaCoder, StarCoder, StarCoder+,
InstructCodeT5+, and ChatGPT. Our detailed analysis of these models emphasizes
the importance of domain knowledge, pragmatic code generation, and contextual
understanding. Our dataset, benchmark, Docker images, and scripts required for
testing are all available at https://github.com/gersteinlab/biocoder.