Dynamic clustering based on evolving system approach
Data Streams, Evolving Algorithms, Hidden Markov Models, Evolving Systems, TEDA
The main objective of this work is to devise an algorithm for time series or ordered data sequences processing. These data sources can be considered as continuous and theoretically infinite data streams. Assuming times series and data sequences as data streams enables using evolving algorithms for the single data pass and cumulative knowledge extraction. This new algorithm is strongly inspired by hidden Markov models (HMM's) and AutoCloud (a evolving algorithm for data clustering), the later is based on TEDA (Typicality and Eccentricity Data Analysis) which will also be used. The AutoCloud will serve as a base to model data patterns similar to the states of HMMs and TEDA will be used to estimate the state transitions thus obtaining a model similar to a traditional HMM. Initially, there will be proposed modifications to AutoCloud to improve the algorithm performance regarding concept drift and concept evolution, also in the cluster merge operation and inclusion for a cluster split operation Furthermore, there will be defined an strategy to calculate the most typical transitions between clusters. It is expected that the performance of AutoCloud does not drop in known benchmarks but also to become more robust when dealing with new datasets.