An Unsupervised-based Feature Selection for Classication tasks
Feature Selection, Classication, Clustering Algorithms
With the increase of the size on the data sets used in classication systems, selecting
the most relevant attribute has become one of the main tasks in pre-processing phase.
In a dataset, it is expected that all attributes are relevant. However, this is not always
veried. Selecting a set of attributes of more relevance aids decreasing the size of the data
without aecting the performance, or even increase it, this way achieving better results
when used in the data classication. The existing features selection methods elect the
best attributes in the data base as a whole, without considering the particularities of
each instance. The Unsupervised-based Feature Selection, proposed method, selects the
relevant attributes for each instance individually, using clustering algorithms to group
them accordingly with their similarities. This work performs an experimental analysis
of dierent clustering techniques applied to this new feature selection approach. The
clustering algorithms k-Means, DBSCAN and Expectation-Maximization (EM) were used
as selection method. Analyzes are performed to verify which of these clustering algorithms
best ts to this new Feature Selection approach. Thus, the contribution of this study is to
present a new approach for attribute selection, through a Semidynamic and a Dynamic
version, and determine which of the clustering methods performs better selection and get
a better performance in the construction of more accurate classifiers.