Unsupervised Automations for a Pareto-Front-based Dynamic Feature Selection
Data preprocessing, Feature Selection, Data Analysis, Clustering Algorithms, Unsupervised techniques.
Several feature selection strategies have been developed in the past decades, using different criteria to select the most relevant features. The use of dynamic feature selection, however, has shown that using multiple criteria simultaneously to determine the best subset of features for similar instances can provide encouraging results. While the use of dynamic selection has alleviated some of the limitations found in traditional selection methods, the exclusive use of supervised evaluation criteria and the manual definition of the number of groups to be used leads to limitations of complex problem analysis in unsupervised scenarios. In this context, this thesis proposes three strands of the dynamic feature selection approach based on the pareto front. The first is related to the inclusion of unsupervised criteria in the base version of PF-DFS/M. The second (PF-DFS/P) and third (PF-DFS/A) strands are variations of the base version, where they include, respectively, partial and full automation of the definition of the number of groups to be used in the preprocessing process through the use of an internal validation index ensemble. The automation of the hyperparameter concerning the number of groups allows, instead of an arbitrary choice, mechanisms to be used that can help researchers to deal with unlabeled databases, or even constitute a deeper analysis under labeled databases. Additionally, an analysis of PF-DFS against noisy data scenarios was proposed. In the investigative analyses real and artificial datasets were used, where the following were evaluated: (I) the performance of PF-DFS in terms of stability and robustness, (II) the behavior of PF-DFS with the inclusion of unsupervised evaluation criteria, and (III) the behavior of PF-DFS with partial and full automation regarding the number of groups.