DC Health: Online Anomaly Detection in Datacenters
Anomaly detection; Datacenter; Half–Space–Trees.
Datacenters are critical environments for the availability of technology-based services. Aiming at the high availability of these services, performance metrics of nodes such as Virtual Machines (VM) or VM clusters are widely monitored. These metrics, such as CPU and memory utilization, can show anomalous patterns associated with failures and performance degradation, culminating in resource exhaustion and total node failure. The field of early detection of anomalies, that is, patterns in data with different behavior than expected, can enable remediation measures, such as VM migration and resource reallocation before losses occur. However, traditional monitoring tools often use fixed thresholds for detecting problems on nodes and lack automatic ways to detect anomalies at runtime. In this sense, machine learning techniques have been reported to detect anomalies in computer systems with online and offline approaches. This work proposes the application called DC Health, as an approach to anticipate the online detection of anomalies in datacenter nodes. The purpose of DC Health is to detect anomalies in the behavior of hosts and alert datacenter operators so that investigation and remediation measures can be taken. For this, this research was conducted from i) Systematic Literature Mapping, ii) problem modeling from real VM data and iii) DC Health evaluation using the prequential method in 6 real-world datasets. The results showed that DC Health excelled in constant memory usage and detection accuracy above 75%. As future work, it is mainly expected to evaluate the detection tool in cloud computing scenarios and develop automated mechanisms for diagnosis and remediation.