Flexible feature engineering using a network flow approach

Loading...
Thumbnail Image

Date

2024

Journal Title

Journal ISSN

Volume Title

Publisher

Universty of the Western Cape

Abstract

Feature engineering, a critical part of the data preparation and exploration phase in predictive modelling, involves transforming predictor variables to enhance interpretability and better understand their relationship with the response variable. In some cases, it also offers automatic handling of outliers and missing values. Many machine learning and data mining techniques perform better with discretised continuous variables or clustered levels of categorical variables, making feature engineering essential for improving the accuracy and robustness of predictive models. Furthermore, the feature engineering process often needs to incorporate business, operational, or best-practice constraints applicable to the final transformed predictor variables or newly created features. This thesis addresses two significant challenges in feature engineering. The first is the supervised discretisation of continuous predictors, which involves partitioning a predictor's domain into disjoint intervals while preserving a specified trend in the relationship with the response variable and adhering to side constraints.

Description

Keywords

Supervised discretisation, Continuous predictors, Categorical predictors, Scorecards, Predictive modelling

Citation