Exploring COVID-19 Symptom Dynamics with Machine Learning: A Two-Year Analysis of Brazil's Cases}
COVID-19, symptoms, machine learning, $t$-sne, apriori, xgboost, xai.
The efficient recognition and tracking of symptoms in viral infections holds great potential for swift and accurate diagnoses, which can potentially mitigate health complications by providing important information for effective interventions. Despite the World Health Organisation (WHO) officially declaring an end to the public health emergency named COVID-19, this viral disease continues to affect populations globally. Quick diagnoses based on symptoms of COVID-19 remains challenging as they often resemble those of other viral infections, particularly other strains of SARS, making it difficult to identify distinct and meaningful symptom patterns as they evolve. In this context, Machine Learning (ML) techniques for automatic identification has the potential to offer a powerful solution for analysing such patterns. Thus, this study proposes a machine-learning-based approach to analyse the changes of COVID-19 predominant symptom patterns over time and assess how these changes have influenced the disease's characterisation during the first two years of the pandemic in Brazil. Using the Brazilian Severe Acute Respiratory Syndrome dataset from Sao Paulo, we have compared symptom data from both SARS-CoV-2 and labeled unspecified SARS cases. Symptoms were visually examined for emerging patterns using the t-SNE dimensionality reduction technique. Subsequently, associations between prevalent symptom sets of confirmed SARS-CoV-2 and unspecified SARS cases were analysed using the Apriori association rule mining technique. Additionally, we evaluated the classification performance of the XGBoost algorithm using two time-based training-test strategies. To further explain the impact of symptom changes on model predictions, feature importance was assessed using SHAP, an explainable AI (xAI) technique.