ChatPaper.aiChatPaper

BioTool:面向大型语言模型生物医学能力增强的综合工具调用数据集

BioTool: A Comprehensive Tool-Calling Dataset for Enhancing Biomedical Capabilities of Large Language Models

May 7, 2026
作者: Xin Gao, Ruiyi Zhang, Meixi Du, Peijia Qin, Pengtao Xie
cs.AI

摘要

尽管大语言模型在通用任务上取得了成功,但在生物医学等高度专业化领域的表现仍不尽如人意。关键限制在于大语言模型无法有效利用生物医学工具,而临床专家和生物医学研究人员在日常工作中高度依赖这些工具。虽然近期通用领域的工具调用数据集显著提升了LLM智能体的能力,但现有生物医学领域的研究主要依赖上下文学习,并将模型限制在少量工具范围内。为弥补这一差距,我们推出了BioTool——一个专为微调大语言模型设计的综合性生物医学工具调用数据集。BioTool整合了来自NCBI、Ensembl和UniProt数据库的34个常用工具,并包含7,040个经过人工验证的高质量查询-API调用对,覆盖变异学、基因组学、蛋白质组学、进化生物学及普通生物学等领域。基于BioTool对40亿参数大语言模型进行微调后,其生物医学工具调用能力获得显著提升,性能超越GPT-5.1等尖端商用大语言模型。此外,人类专家评估表明,与未使用工具的同等模型相比,集成经BioTool微调的工具调用器能显著提升下游答案质量,这凸显了BioTool在增强大语言模型生物医学能力方面的有效性。完整数据集与评估代码已发布于https://github.com/gxx27/BioTool。
English
Despite the success of large language models (LLMs) on general-purpose tasks, their performance in highly specialized domains such as biomedicine remains unsatisfactory. A key limitation is the inability of LLMs to effectively leverage biomedical tools, which clinical experts and biomedical researchers rely on extensively in daily workflows. While recent general-domain tool-calling datasets have substantially improved the capabilities of LLM agents, existing efforts in the biomedical domain largely rely on in-context learning and restrict models to a small set of tools. To address this gap, we introduce BioTool, a comprehensive biomedical tool-calling dataset designed for fine-tuning LLMs. BioTool comprises 34 frequently used tools collected from the NCBI, Ensembl, and UniProt databases, along with 7,040 high-quality, human-verified query-API call pairs spanning variation, genomics, proteomics, evolution, and general biology. Fine-tuning a 4-billion-parameter LLM on BioTool yields substantial improvements in biomedical tool-calling performance, outperforming cutting-edge commercial LLMs such as GPT-5.1. Furthermore, human expert evaluations demonstrate that integrating a BioTool-fine-tuned tool caller significantly improves downstream answer quality compared to the same LLM without tool usage, highlighting the effectiveness of BioTool in enhancing the biomedical capabilities of LLMs. The full dataset and evaluation code are available at https://github.com/gxx27/BioTool
PDF02May 9, 2026