Skip to main content
9th World Conference on Information Systems and Technologies

Full Program »

Benchmark of Encoders of Nominal Features For Regression

Mixed-type data is common in the real world. However, supervised learning algorithms such as support vector machines or neural networks can only process numerical features. One may choose to drop qualitative features, at the expense of possible loss of information. A better alternative is to encode them as new numerical features. Under the constraints of time, budget, and computational resources, we were motivated to search for a general-purpose encoder but found the existing benchmarks to be limited. We review these limitations and present an alternative. Our benchmark tests 16 encoding methods, on 15 regression datasets, using 7 distinct predictive models. The top general-purpose encoders were found to be Catboost, LeaveOneOut, and Target.

Diogo Seca
INESC TEC
Portugal

João Mendes-Moreira
FEUP, INESC TEC
Portugal

 


Powered by OpenConf®
Copyright ©2002-2020 Zakon Group LLC