NRGBoost:基于能量的生成增强树
NRGBoost: Energy-Based Generative Boosted Trees
October 4, 2024
作者: João Bravo
cs.AI
摘要
尽管深度学习在非结构化数据领域占据主导地位,但基于树的方法,如随机森林(RF)和梯度提升决策树(GBDT),仍然是处理表格数据中的判别任务的主力军。我们探讨了这些流行算法的生成扩展,重点是明确地对数据密度进行建模(直至归一化常数),从而使其能够除了抽样之外还能应用于其他任务。作为我们的主要贡献,我们提出了一种基于能量的生成增强算法,类似于流行软件包(如XGBoost)中实现的二阶增强。我们表明,尽管生成了一个能够处理任何输入变量上推理任务的生成模型,我们提出的算法在许多真实世界的表格数据集上可以实现与GBDT类似的判别性能,优于其他生成方法。同时,我们还展示它在抽样方面与基于神经网络的模型具有竞争力。
English
Despite the rise to dominance of deep learning in unstructured data domains,
tree-based methods such as Random Forests (RF) and Gradient Boosted Decision
Trees (GBDT) are still the workhorses for handling discriminative tasks on
tabular data. We explore generative extensions of these popular algorithms with
a focus on explicitly modeling the data density (up to a normalization
constant), thus enabling other applications besides sampling. As our main
contribution we propose an energy-based generative boosting algorithm that is
analogous to the second order boosting implemented in popular packages like
XGBoost. We show that, despite producing a generative model capable of handling
inference tasks over any input variable, our proposed algorithm can achieve
similar discriminative performance to GBDT on a number of real world tabular
datasets, outperforming alternative generative approaches. At the same time, we
show that it is also competitive with neural network based models for sampling.Summary
AI-Generated Summary