New proposal for viral genome representation applied to the classification of SARS-CoV-2 with deep learning
COVID-19, SARS-CoV-2, Genomic representation, Genomic signal processing, Deep learning.
In December of 2019, the first case of COVID-19 was found in Wuhan, China, and in April of 2021, there were already 136 million confirmed cases. Due to the virus fast propagation, the scientific community has been making efforts to develop viral classifications techniques for the SARS-CoV-2. In this work, was developed, using a set of techniques from Genomic Signal Processing, a new proposal of genomic data representation of six viruses from the Coronaviridae family, which the SARS-CoV-2 belongs to. Then, the accomplished mapping was applied in a deep learning architecture for the samples' viral classification, obtaining accuracy of 94% e 91% for the sequences resized for the sizes of 64 and 128, respectively, also obtaining sensibility of 100% for the vectors with size 64. Lastly, given the mutation rate of the RNA virus, new variants emerged, and with them the possibility of an increase in cases. It was then, using the developed technique, carried out an analysis of the evolution of four variants of concern in three viral classification procedures, the results obtained aided comprehending the phylogenetic relationships between the variants.