Information Theory Approaches for Automated Feature Selction
Feature Selection, Information Theory
One of the main problems of machine learning algorithms is the dimensionality problem. With the rapid growth of complex data in real-world scenarios, attribute selection becomes a mandatory pre-processing step in any application to reduce data complexity and computational time. Based on this, several works were produced to develop efficient methods to accomplish this task. Most attribute selection methods select the best attributes based on some specific criteria. In addition, recent studies have successfully constructed models to select attributes considering the particularities of the data, assuming that similar samples should be treated separately. Although some progress has been made, a poor choice of a single algorithm or criterion to assess the importance of attributes, and the arbitrary choice of attribute numbers made by the user can lead to poor analysis. In order to overcome some of these issues, this paper presents the development of some two strands of automated attribute selection approaches. The first are fusion methods of multiple attribute selection algorithms, which use ranking-based strategies and classifier committees to combine attribute selection algorithms in terms of data (Data Fusion) and decision (Fusion Decision) algorithms, allowing researchers to consider different perspectives in the attribute selection step. The second method (PF-DFS) brings an improvement of a dynamic selection algorithm (DFS) using the idea of Pareto frontier multiobjective optimization, which allows us to consider different perspectives of the relevance of the attributes and to automatically define the number of attributes to select . The proposed approaches were tested using more than 15 actual and artificial databases and the results showed that when compared to individual selection methods such as the original DFS itself, the performance of one of the proposed methods is notably higher. In fact, the results are promising since the proposed approaches have also achieved superior performance when compared to established dimensionality reduction methods, and by using the original data sets, showing that the reduction of noisy and / or redundant attributes may have a positive effect on the performance of classification tasks.