Unbalanced data processing using oversampling: machine Learning
Artículo de revista
2020
Corporación Universidad de la Costa
Nowadays, the DL algorithms show good results when used in the solution of different problems which present similar characteristics as the great amount of data and high dimensionality. However, one of the main challenges that currently arises is the classification of high dimensionality databases, with very few samples and high-class imbalance. Biomedical databases of gene expression microarrays present the characteristics mentioned above, presenting problems of class imbalance, with few samples and high dimensionality. The problem of class imbalance arises when the set of samples belonging to one class is much larger than the set of samples of the other class or classes. This problem has been identified as one of the main challenges of the algorithms applied in the context of Big Data. The objective of this research is the study of genetic expression databases, using conventional methods of sub and oversampling for the balance of classes such as RUS, ROS and SMOTE. The databases were modified by applying an increase in their imbalance and in another case generating artificial noise.
- Artículos científicos [3154]
Descripción:
Unbalanced data processing using oversampling, Machine Learning.pdf
Título: Unbalanced data processing using oversampling, Machine Learning.pdf
Tamaño: 495.6Kb
PDFLEER EN FLIP
Título: Unbalanced data processing using oversampling, Machine Learning.pdf
Tamaño: 495.6Kb
PDFLEER EN FLIP
El ítem tiene asociados los siguientes ficheros de licencia: