BioTool:增强大型语言模型生物医学能力的综合性工具调用数据集
BioTool: A Comprehensive Tool-Calling Dataset for Enhancing Biomedical Capabilities of Large Language Models
May 7, 2026
作者: Xin Gao, Ruiyi Zhang, Meixi Du, Peijia Qin, Pengtao Xie
cs.AI
摘要
尽管大型语言模型(LLMs)在通用任务上取得了成功,但在生物医学等高度专业化领域的表现仍不尽如人意。关键限制在于LLMs无法有效利用生物医学工具,而临床专家和生物医学研究人员在日常工作中高度依赖这些工具。虽然近期的通用领域工具调用数据集显著提升了LLM智能体的能力,但生物医学领域的现有研究主要依赖上下文学习,并将模型限制在少量工具集内。为弥补这一空白,我们推出了BioTool——一个专为微调LLMs设计的综合性生物医学工具调用数据集。BioTool整合了来自NCBI、Ensembl和UniProt数据库的34个常用工具,并包含7,040个经过人工验证的高质量查询-API调用对,涵盖变异学、基因组学、蛋白质组学、进化生物学和通用生物学领域。基于BioTool对40亿参数LLM进行微调后,其生物医学工具调用性能实现显著提升,甚至超越了GPT-5.1等尖端商业LLM。此外,人类专家评估表明,与未使用工具的相同LLM相比,集成经BioTool微调的工具调用器能显著提升下游答案质量,这凸显了BioTool在增强LLM生物医学能力方面的有效性。完整数据集与评估代码已发布于https://github.com/gxx27/BioTool。
English
Despite the success of large language models (LLMs) on general-purpose tasks, their performance in highly specialized domains such as biomedicine remains unsatisfactory. A key limitation is the inability of LLMs to effectively leverage biomedical tools, which clinical experts and biomedical researchers rely on extensively in daily workflows. While recent general-domain tool-calling datasets have substantially improved the capabilities of LLM agents, existing efforts in the biomedical domain largely rely on in-context learning and restrict models to a small set of tools. To address this gap, we introduce BioTool, a comprehensive biomedical tool-calling dataset designed for fine-tuning LLMs. BioTool comprises 34 frequently used tools collected from the NCBI, Ensembl, and UniProt databases, along with 7,040 high-quality, human-verified query-API call pairs spanning variation, genomics, proteomics, evolution, and general biology. Fine-tuning a 4-billion-parameter LLM on BioTool yields substantial improvements in biomedical tool-calling performance, outperforming cutting-edge commercial LLMs such as GPT-5.1. Furthermore, human expert evaluations demonstrate that integrating a BioTool-fine-tuned tool caller significantly improves downstream answer quality compared to the same LLM without tool usage, highlighting the effectiveness of BioTool in enhancing the biomedical capabilities of LLMs. The full dataset and evaluation code are available at https://github.com/gxx27/BioTool