A Set of Independent Variables for Time Series Regression Tasks of Pandemic Scenarios Based on COVID-19
COVID-19 Pandemic, Independent Variables, Neural Networks, Time Series Regression
We propose the creation of a set of independent variables, derived from the analysis of phenomena, government interventions and events that may influence the spread of the SARS-CoV-2 virus, whether positively or negatively in relation to the trend of the curve of cases and deaths. This approach aims to identify relevant characteristics and parameters applicable to these scenarios, using regression techniques. To achieve this, we carried out studies using machine learning methodologies to determine which variables should be selected and applied to the regression models. Our strategy included collecting and cleaning data, evaluating the generalization of models, and applying machine learning techniques, such as regression using Seasonal ARIMA, clustering, dimensionality reduction through principal component analysis and neural networks, in addition to creating informative data visualizations. As a result, we compiled and proposed the use of a set of independent variables to predict the number of deaths from COVID-19, aiming to increase accuracy and reduce the standard deviation in relation to real values, and avoid underspecification of the problem. The main contributions of this work include the investigation of the causes and possible relationships between phenomena, government measures and events in the spread of the virus in urban areas, such as cities and countries. The results of this study can serve as a complementary resource to assist governments and authorities in making decisions in the face of possible future pandemic scenarios, such as the one faced during the COVID-19 crisis, which, although still present, has a lower mortality rate.