Bioinformatics and Machine Learning Analysis in the Search for Biomarkers in Lung Squamous Cell Carcinoma.
Survival analysis, Machine learning, Gene signature, Lung cancer, Artificial intelligence.
Lung cancer is the leading cause of cancer death worldwide, regardless of gender. Among the types of lung cancer, Lung Squamous Cell Carcinoma (LUSC) is the second most common type, characterized by diagnosis in advanced stages, poor prognosis, and high association with smoking. Due to the severity of lung cancer, it is essential to understand its molecular mechanisms. In this context, this study uses molecular and clinical data to implement bioinformatics and machine learning pipelines, through Random Forest and Deep Learning, aiming to predict patient prognosis and obtain a genetic signature of LUSC for tumor progression. We analyzed clinical and molecular data from the LUSC-TCGA project and performed differential expression analyses (DEA) comparing normal tissues with tumor tissues. Based on the genes selected by DEA, patients were divided into three groups, followed by feature selection and classification steps. From this, it was possible to obtain classification results close to 70% accuracy for the three clusters. Finally, we also performed a functional enrichment analysis. The analysis revealed in the cluster 2 enriched genes such as CDT1, CENPI, and NLGN1, associated with the molecular process EMT (epithelial-mesenchymal transition). Our approach facilitated the identification of genes that are biologically relevant to the LUSC development process, containing significant genes (such as ALDH3B1, C7, FAM83A, FOSB, GCGR, BMP7, PPP1R27, and AQP1) and genes relevant for predicting LUSC patient survival and possible therapeutic targets for LUSC (such as the FAM83A, CAV1, TNS4, EIF4G1, TFAP2A, GCGR, and PPP1R27 genes). The results of the work carried out were published in the articles presented in Annexes A and B.