다중 모달 생체의학 데이터를 활용한 해석 가능한 퓨샷 학습 기반 알츠하이머병 예측을 위한 표 형식 대규모 언어 모델

초록

알츠하이머병(AD)의 정확한 진단에는 표 형태의 바이오마커 데이터 처리가 필요하지만, 이러한 데이터는 규모가 작고 불완전한 경우가 많아 딥러닝 모델이 기존 방법론을 능가하지 못하는 경우가 빈번합니다. 사전 훈련된 대규모 언어 모델(LLM)은 소수 샘플 일반화 능력, 구조화된 추론, 해석 가능한 출력을 제공함으로써 임상 예측 분야에 강력한 패러다임 전환을 가져왔습니다. 본 연구에서는 TableGPT2를 기반으로 하여 일반 텍스트가 아닌 표 형태의 프롬프트를 사용하여 소수 샘플 AD 분류를 위해 미세 조정된 도메인 적응형 표 LLM 프레임워크인 TAP-GPT(Tabular Alzheimer's Prediction GPT)를 제안합니다. 우리는 TAP-GPT를 QT-PAD 바이오마커 및 영역 수준 구조적 MRI, 아밀로이드 PET, 타우 PET를 포함한 4개의 ADNI 파생 데이터셋에서 이진 AD 분류 과제를 통해 평가했습니다. 다중 모드 및 단일 모드 설정 전반에 걸쳐 TAP-GPT는 백본 모델 대비 성능을 향상시켰으며, 소수 샘플 설정에서 기존 머신러닝 기준 모델을 능가하는 동시에 최첨단 범용 LLM과도 경쟁력을 유지했습니다. 특징 선택이 고차원 입력에서의 성능 저하를 완화하며, TAP-GPT는 대체 처리 없이도 시뮬레이션 및 실제 결측 조건에서 안정적인 성능을 유지함을 보여줍니다. 또한 TAP-GPT는 기존 AD 생물학과 일치하는 구조화되고 모드 인식 추론을 생성하며, 자기 반성 하에서 더 큰 안정성을 나타내 반복적 다중 에이전트 시스템에서의 활용을 지원합니다. 우리가 알기로, 표 전용 LLM을 다중 모드 바이오마커 기반 AD 예측에 체계적으로 적용한 첫 사례이며, 이러한 사전 훈련 모델이 구조화된 임상 예측 과제를 효과적으로 해결할 수 있음을 입증하고 표 LLM 기반 다중 에이전트 임상 의사결정 지원 시스템의 기초를 마련했습니다. 소스 코드는 GitHub에서 공개적으로 이용 가능합니다: https://github.com/sophie-kearney/TAP-GPT.

English

Accurate diagnosis of Alzheimer's disease (AD) requires handling tabular biomarker data, yet such data are often small and incomplete, where deep learning models frequently fail to outperform classical methods. Pretrained large language models (LLMs) offer few-shot generalization, structured reasoning, and interpretable outputs, providing a powerful paradigm shift for clinical prediction. We propose TAP-GPT Tabular Alzheimer's Prediction GPT, a domain-adapted tabular LLM framework built on TableGPT2 and fine-tuned for few-shot AD classification using tabular prompts rather than plain texts. We evaluate TAP-GPT across four ADNI-derived datasets, including QT-PAD biomarkers and region-level structural MRI, amyloid PET, and tau PET for binary AD classification. Across multimodal and unimodal settings, TAP-GPT improves upon its backbone models and outperforms traditional machine learning baselines in the few-shot setting while remaining competitive with state-of-the-art general-purpose LLMs. We show that feature selection mitigates degradation in high-dimensional inputs and that TAP-GPT maintains stable performance under simulated and real-world missingness without imputation. Additionally, TAP-GPT produces structured, modality-aware reasoning aligned with established AD biology and shows greater stability under self-reflection, supporting its use in iterative multi-agent systems. To our knowledge, this is the first systematic application of a tabular-specialized LLM to multimodal biomarker-based AD prediction, demonstrating that such pretrained models can effectively address structured clinical prediction tasks and laying the foundation for tabular LLM-driven multi-agent clinical decision-support systems. The source code is publicly available on GitHub: https://github.com/sophie-kearney/TAP-GPT.

다중 모달 생체의학 데이터를 활용한 해석 가능한 퓨샷 학습 기반 알츠하이머병 예측을 위한 표 형식 대규모 언어 모델

Tabular LLMs for Interpretable Few-Shot Alzheimer's Disease Prediction with Multimodal Biomedical Data

초록

Support