RelBench:关系数据库上深度学习的基准测试
RelBench: A Benchmark for Deep Learning on Relational Databases
July 29, 2024
作者: Joshua Robinson, Rishabh Ranjan, Weihua Hu, Kexin Huang, Jiaqi Han, Alejandro Dobles, Matthias Fey, Jan E. Lenssen, Yiwen Yuan, Zecheng Zhang, Xinwei He, Jure Leskovec
cs.AI
摘要
我们介绍了RelBench,这是一个用于利用图神经网络解决关系数据库中预测任务的公共基准。RelBench提供了跨越不同领域和规模的数据库和任务,并旨在成为未来研究的基础设施。我们使用RelBench进行了对关系深度学习(RDL)(Fey等,2024年)的首次全面研究,该研究将图神经网络预测模型与(深度)表格模型相结合,从原始表格中提取初始实体级表示。端到端学习的RDL模型充分利用了主外键链接中编码的预测信号,标志着从手工特征工程结合表格模型的主导范式明显转变。为了彻底评估RDL与先前的黄金标准相比,我们进行了一项深入的用户研究,其中一位经验丰富的数据科学家为每个任务手动工程化特征。在这项研究中,RDL学习到了更好的模型,同时将人工工作量减少了一个数量级以上。这展示了深度学习在解决关系数据库中预测任务方面的能力,为通过RelBench实现的许多新研究机会打开了大门。
English
We present RelBench, a public benchmark for solving predictive tasks over
relational databases with graph neural networks. RelBench provides databases
and tasks spanning diverse domains and scales, and is intended to be a
foundational infrastructure for future research. We use RelBench to conduct the
first comprehensive study of Relational Deep Learning (RDL) (Fey et al., 2024),
which combines graph neural network predictive models with (deep) tabular
models that extract initial entity-level representations from raw tables.
End-to-end learned RDL models fully exploit the predictive signal encoded in
primary-foreign key links, marking a significant shift away from the dominant
paradigm of manual feature engineering combined with tabular models. To
thoroughly evaluate RDL against this prior gold-standard, we conduct an
in-depth user study where an experienced data scientist manually engineers
features for each task. In this study, RDL learns better models whilst reducing
human work needed by more than an order of magnitude. This demonstrates the
power of deep learning for solving predictive tasks over relational databases,
opening up many new research opportunities enabled by RelBench.Summary
AI-Generated Summary