Banca de DEFESA: DÉBORA VIRGÍNIA DA COSTA E LIMA

Uma banca de DEFESA de DOUTORADO foi cadastrada pelo programa.
STUDENT : DÉBORA VIRGÍNIA DA COSTA E LIMA
DATE: 24/07/2025
TIME: 08:00
LOCAL: meet.google.com/cbr-pkcn-few
TITLE:

Bioinformatics and Machine Learning Analysis in the Search for Biomarkers in Lung Squamous Cell Carcinoma.


KEY WORDS:

Lung cancer, Survival analysis, Machine learning, Gene signature, Artificial intelligence.


PAGES: 57
BIG AREA: Ciências Biológicas
AREA: Genética
SUMMARY:

Lung cancer is the leading cause of cancer death worldwide, regardless of gender. Among lung cancer types, Lung Squamous Cell Carcinoma (LUSC) is the second most common type, characterized by advanced stage diagnosis, poor prognosis, and high association with smoking. Due to the severity of lung cancer, it is essential to understand its molecular mechanisms. In this context, this study utilizes molecular data to identify biomarkers in lung squamous cell carcinoma. The work uses molecular and clinical data to implement bioinformatics pipelines, machine learning, predict patient prognosis, and obtain a genetic signature of LUSC for tumor progression. We analyzed clinical and molecular data from the LUSC-TCGA project and performed differential expression analysis (DEA) comparing normal tissues with tumor tissues. Based on the genes selected by DEA, the patients were divided into three groups, followed by feature selection and classification steps. From this, it was possible to obtain classification results close to 70% accuracy for the three clusters. Finally, we also performed a functional enrichment analysis. The analysis revealed 2 enriched genes in the cluster, such as CDT1, CENPI, and NLGN1, associated with the molecular process EMT (epithelial-mesenchymal transition). Our approach facilitated the identification of genes that are biologically relevant to the LUSC development process (such as ALDH3B1, C7, FAM83A, FOSB, GCGR, BMP7, PPP1R27, and AQP1 genes) and genes pertinent to predicting patient survival and potential therapeutic targets for LUSC (such as FAM83A, CAV1, TNS4, EIF4G1, TFAP2A, GCGR, and PPP1R27 genes). Next, the expression data of the selected gene sets in the clusters were used, combined with feature selection, data balancing, machine learning, and explainable artificial intelligence (XAI) to identify a signature with potential staging-related biomarkers. The employed methods demonstrated robust classification metrics, with the random forest classifier achieving the highest accuracy (0.91). The use of data balancing and feature selection techniques proved to be crucial in the classification process. Furthermore, it was possible to identify the 16 most relevant genes selected by random forest using the SHapley Additive Explanations (SHAP) method. Among them, three genes (MYOSLID, IMPDH1P8, and COL9A3) were chosen by all successful classifiers, positioning themselves as potential staging biomarkers and possible molecular therapeutic targets for LUSC.


COMMITTEE MEMBERS:
Presidente - 1365498 - BEATRIZ STRANSKY FERREIRA
Interno - 3063244 - TETSU SAKAMOTO
Externo à Instituição - ANDRE MAURICIO RIBEIRO DOS SANTOS - UFPA
Externo à Instituição - ANDRÉ LUÍS FONSECA FAUSTINO - UFPA
Externo à Instituição - TAFFAREL MELO TORRES - UFERSA
Notícia cadastrada em: 11/07/2025 11:34
SIGAA | Superintendência de Tecnologia da Informação - (84) 3342 2210 | Copyright © 2006-2026 - UFRN - sigaa14-producao.info.ufrn.br.sigaa14-producao