Deep Learning techniques applied to Viral Genome Classification of SARS-CoV-2 virus
Deep Learning, SARS-CoV-2, COVID-19, Viral classification
In the last months, the world was intensely affected by the COVID-19 pandemic, caused by the SARS-CoV-2 virus, which was first identified in December 2019 in Wuhan, China. In March 2020, the World Health Organization (WHO) raised the level of contamination to the COVID-19 pandemic, due to its geographical spread across several countries. One of the fields of research in the bioinformatics area is the analysis of genomic sequences. In that case of a novel virus identification, the early elucidation of taxonomic classification and origin of the virus genomic sequence is essential for strategic planning, containment, and treatments. Deep learning techniques have been successfully used in many viral classification problems associated with viral infections diagnosis, metagenomics, phylogenetic, and analysis. Thus, this work proposes to generate an efficient viral genome classifier for the SARS-CoV-2 virus using Deep Learning techniques, such as Stacked Sparse Autoencoder (SSAE) and Convolutional Neural Network (CNN). Experiments with other virus datasets will also be proposed. To auxiliary this process, we also intend to generate a digital signature of the virus to provide relevant numerical representations of the sequences. The preliminary results presented here with the SSAE technique applied to viral genomic sequences, collected from a dataset, indicate this feasibility.