Classification and features selection method for obesity level prediction

Obesity has become one of the world’s largest health issues, rich and poor countries, without exception, have each year larger populations with this condition. Obesity and overweight are defined as abnormal or excessive fat accumulation that may impair health according to the World Health Organization (WHO) and has nearly tripled since 1975. Data Mining and their techniques have become a strong scientific field to analyze huge data sources and to provide new information about patterns and behaviors from the population. This study uses data mining techniques to build a model for obesity prediction, using a dataset based on a survey for college students in several countries. After cleaning and transformation of the data, a set of classification methods was implemented (Logistic Model Tree - LMT, RandomForest - RF, Multi-Layer Perceptron - MLP and Support Vector Machines - SVM), and the feature selection methods InfoGain, GainRatio, Chi-Square and Relief, finally, crossed validation was performed for the training and testing processes. The data showed than LMT had the best performance in precision, obtaining 96.65%, compared to RandomForest (95.62%), MLP (94.41%) and SMO (83.89%), so this study shows that LMT it can be used with confidence to analyze obesity and similar data.

URI

https://hdl.handle.net/11323/8417

Fuente

Journal of Theoretical and Applied Information Technology

Colecciones