Advanced Convolutional Neural Network Techniques for Classification of SARS-CoV-2 Variants and Other Viruses: A Study Using k-mers and Chaos Game Representation
SARS-CoV-2; COVID-19; deep learning; convolutional neural network; k-mers; Chaos Game Representation; viral classification.
Since December 2019, the global impact of the COVID-19 pandemic, caused by the SARS-CoV-2 virus, has been profound. Early identification of the virus’s taxonomic classification and genomic origin is critical for strategic planning, containment, and treatment. Deep learning techniques have proven successful in addressing various viral classification challenges, including diagnosis, metagenomics, phylogenetics, and genomic analysis. Motivated by these advances, this study introduces an effective viral genome classifier for SARS-CoV-2, utilizing a convolutional neural network (CNN) framework. This research employed image representations of complete genome sequences to train the CNN, leveraging two distinct datasets: one based on k-mer image representation and the other on Chaos Game Representation (CGR). The k-mer dataset was used for taxonomic classification experiments of the SARS-CoV-2 virus, while the CGR dataset focused on classifying variants of concern (VOC) of SARS-CoV-2. The CNN achieved remarkable performance in taxonomic classification, with accuracy rates ranging from 92% to 100% on the validation set and between 98.9% and 100% on the test set containing SARS-CoV-2 samples. These results demonstrate the model’s adaptability for classifying other emerging viruses. For the classification of SARS-CoV-2 variants using CGR images, the CNN delivered even higher accuracy, reaching 99.9% on the validation set and 99.8% on the test set. The findings underscore the applicability of deep learning techniques in genome classification tasks, providing a robust tool for the early detection and classification of viral threats. The integration of CNNs with k-mer and CGR image representations presents a novel and effective method for viral genome analysis, supporting ongoing efforts in virology and public health.