基于表格型大语言模型的多模态生物医学数据可解释性少样本阿尔茨海默病预测

摘要

阿尔茨海默病（AD）的精准诊断需要处理表格型生物标志物数据，但此类数据通常规模小且存在缺失，导致深度学习模型往往难以超越传统方法。预训练大语言模型（LLM）具备少样本泛化能力、结构化推理和可解释输出等特性，为临床预测提供了革命性的新范式。我们提出TAP-GPT（表格型阿尔茨海默病预测GPT），该框架基于TableGPT2构建，通过表格提示而非纯文本进行少样本AD分类的领域自适应微调。我们在四个ADNI衍生数据集（包含QT-PAD生物标志物、区域级结构MRI、淀粉样蛋白PET和tau PET数据）上评估了TAP-GPT的二元AD分类性能。在多模态与单模态场景下，TAP-GPT不仅超越了其骨干模型，在少样本设定下优于传统机器学习基线，还与通用LLM的最新成果保持竞争力。研究表明：特征选择可缓解高维输入的性能衰减，TAP-GPT在模拟和真实数据缺失场景下无需插补即可保持稳定性能。此外，该模型能生成符合AD生物学机制的结构化、模态感知推理结果，并在自反思机制下表现出更强稳定性，适用于迭代式多智能体系统。据我们所知，这是首个将表格专用LLM系统应用于多模态生物标志物AD预测的研究，证明了预训练模型能有效处理结构化临床预测任务，为表格LLM驱动的多智能体临床决策支持系统奠定了基础。源代码已公开于GitHub：https://github.com/sophie-kearney/TAP-GPT。

English

Accurate diagnosis of Alzheimer's disease (AD) requires handling tabular biomarker data, yet such data are often small and incomplete, where deep learning models frequently fail to outperform classical methods. Pretrained large language models (LLMs) offer few-shot generalization, structured reasoning, and interpretable outputs, providing a powerful paradigm shift for clinical prediction. We propose TAP-GPT Tabular Alzheimer's Prediction GPT, a domain-adapted tabular LLM framework built on TableGPT2 and fine-tuned for few-shot AD classification using tabular prompts rather than plain texts. We evaluate TAP-GPT across four ADNI-derived datasets, including QT-PAD biomarkers and region-level structural MRI, amyloid PET, and tau PET for binary AD classification. Across multimodal and unimodal settings, TAP-GPT improves upon its backbone models and outperforms traditional machine learning baselines in the few-shot setting while remaining competitive with state-of-the-art general-purpose LLMs. We show that feature selection mitigates degradation in high-dimensional inputs and that TAP-GPT maintains stable performance under simulated and real-world missingness without imputation. Additionally, TAP-GPT produces structured, modality-aware reasoning aligned with established AD biology and shows greater stability under self-reflection, supporting its use in iterative multi-agent systems. To our knowledge, this is the first systematic application of a tabular-specialized LLM to multimodal biomarker-based AD prediction, demonstrating that such pretrained models can effectively address structured clinical prediction tasks and laying the foundation for tabular LLM-driven multi-agent clinical decision-support systems. The source code is publicly available on GitHub: https://github.com/sophie-kearney/TAP-GPT.

基于表格型大语言模型的多模态生物医学数据可解释性少样本阿尔茨海默病预测

Tabular LLMs for Interpretable Few-Shot Alzheimer's Disease Prediction with Multimodal Biomedical Data

摘要

Support