BiomedSQL：面向生物醫學知識庫科學推理的文本至SQL轉換

摘要

生物醫學研究人員日益依賴大規模結構化數據庫來完成複雜的分析任務。然而，現有的文本到SQL系統在將定性的科學問題映射為可執行的SQL語句時常常遇到困難，尤其是在需要隱含領域推理的情況下。我們推出了BiomedSQL，這是首個專門設計用於評估在真實世界生物醫學知識庫上進行文本到SQL生成時科學推理能力的基準。BiomedSQL包含68,000個基於統一BigQuery知識庫的問題/SQL查詢/答案三元組，該知識庫整合了基因-疾病關聯、來自組學數據的因果推斷以及藥物批准記錄。每個問題都要求模型推斷領域特定的標準，如全基因組顯著性閾值、效應方向性或試驗階段過濾，而非僅僅依賴於語法翻譯。我們評估了一系列開源和閉源的大型語言模型（LLM）在不同提示策略和交互範式下的表現。結果顯示存在顯著的性能差距：GPT-o3-mini的執行準確率為59.0%，而我們定制的多步代理BMSQL達到了62.6%，兩者均遠低於專家基線的90.0%。BiomedSQL為推進能夠通過對結構化生物醫學知識庫進行穩健推理來支持科學發現的文本到SQL系統提供了新的基礎。我們的數據集公開於https://huggingface.co/datasets/NIH-CARD/BiomedSQL，代碼開源於https://github.com/NIH-CARD/biomedsql。

English

Biomedical researchers increasingly rely on large-scale structured databases for complex analytical tasks. However, current text-to-SQL systems often struggle to map qualitative scientific questions into executable SQL, particularly when implicit domain reasoning is required. We introduce BiomedSQL, the first benchmark explicitly designed to evaluate scientific reasoning in text-to-SQL generation over a real-world biomedical knowledge base. BiomedSQL comprises 68,000 question/SQL query/answer triples grounded in a harmonized BigQuery knowledge base that integrates gene-disease associations, causal inference from omics data, and drug approval records. Each question requires models to infer domain-specific criteria, such as genome-wide significance thresholds, effect directionality, or trial phase filtering, rather than rely on syntactic translation alone. We evaluate a range of open- and closed-source LLMs across prompting strategies and interaction paradigms. Our results reveal a substantial performance gap: GPT-o3-mini achieves 59.0% execution accuracy, while our custom multi-step agent, BMSQL, reaches 62.6%, both well below the expert baseline of 90.0%. BiomedSQL provides a new foundation for advancing text-to-SQL systems capable of supporting scientific discovery through robust reasoning over structured biomedical knowledge bases. Our dataset is publicly available at https://huggingface.co/datasets/NIH-CARD/BiomedSQL, and our code is open-source at https://github.com/NIH-CARD/biomedsql.

BiomedSQL：面向生物醫學知識庫科學推理的文本至SQL轉換

BiomedSQL: Text-to-SQL for Scientific Reasoning on Biomedical Knowledge Bases

摘要

Summary

Support

Support