Unsupervised Clustering and Classification of Data Streams Based on Typicality and Eccentricity
outlier detection, data clustering, data classification, data streams, TEDA, typicality, eccentricity, Data Cloud, Auto-Cloud.
In this thesis we propose a new approach to unsupervised data clustering and classification. The proposed approach is based on typicality and eccentricity concepts. This concepts are used by recently introduced TEDA algorithm to outlier detection. To perform data clustering and classification is proposed a new statistical algorithm, called Auto-
Cloud. The analyzed data samples by Auto-Cloud are grouped in the form of unities called Data Clouds, which are structures without pre-defined shape or boundaries. Auto-Cloud allows each data sample belongs to multiple Data Clouds simultaneously. Auto-Cloud is an autonomous algorithm, which don’t requires previous training or any priori knowledge about the data set. Auto-Cloud is able to create and merge Data Clouds autonomously as data samples are obtained. The algorithm is suitable for data clustering and classification of online data streams and application that require real-time response. Auto-Cloud also is recursive, which makes it fast and with little computational effort. The classification data process uses the measure of relevance between each sample to each data cloud created in clustering process. The class to which it belongs each sample is determined by the cloud with a higher measure relevance with respect to the sample. To validate the proposed method, we use it in a fault detection in industrial processes application. For this, we use real data, obtained from two industrial plants.