ChatPaper.aiChatPaper

LLM-FE:利用大型語言模型作為進化優化器實現表格數據的自動化特徵工程

LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers

March 18, 2025
作者: Nikhil Abhyankar, Parshin Shojaee, Chandan K. Reddy
cs.AI

摘要

自動化特徵工程在提升表格學習任務的預測模型性能中扮演著關鍵角色。傳統的自動化特徵工程方法受限於其依賴於預先定義的轉換操作,這些操作被限制在固定且手動設計的搜索空間內,往往忽視了領域知識。近期,利用大型語言模型(LLMs)的進展使得將領域知識整合到特徵工程過程中成為可能。然而,現有的基於LLM的方法要麼使用直接提示,要麼僅依賴驗證分數進行特徵選擇,未能充分利用先前特徵發現實驗的洞察,也未能建立特徵生成與數據驅動性能之間的有意義推理。為解決這些挑戰,我們提出了LLM-FE,這是一個新穎的框架,它結合了進化搜索與LLMs的領域知識和推理能力,以自動發現適用於表格學習任務的有效特徵。LLM-FE將特徵工程表述為一個程序搜索問題,其中LLMs迭代地提出新的特徵轉換程序,而數據驅動的反饋則引導搜索過程。我們的結果表明,LLM-FE在多樣化的分類和迴歸基準測試中,持續超越最先進的基線方法,顯著提升了表格預測模型的性能。
English
Automated feature engineering plays a critical role in improving predictive model performance for tabular learning tasks. Traditional automated feature engineering methods are limited by their reliance on pre-defined transformations within fixed, manually designed search spaces, often neglecting domain knowledge. Recent advances using Large Language Models (LLMs) have enabled the integration of domain knowledge into the feature engineering process. However, existing LLM-based approaches use direct prompting or rely solely on validation scores for feature selection, failing to leverage insights from prior feature discovery experiments or establish meaningful reasoning between feature generation and data-driven performance. To address these challenges, we propose LLM-FE, a novel framework that combines evolutionary search with the domain knowledge and reasoning capabilities of LLMs to automatically discover effective features for tabular learning tasks. LLM-FE formulates feature engineering as a program search problem, where LLMs propose new feature transformation programs iteratively, and data-driven feedback guides the search process. Our results demonstrate that LLM-FE consistently outperforms state-of-the-art baselines, significantly enhancing the performance of tabular prediction models across diverse classification and regression benchmarks.

Summary

AI-Generated Summary

PDF72March 20, 2025