Banca de DEFESA: LUCAS DE FREITAS LACERDA

Uma banca de DEFESA de MESTRADO foi cadastrada pelo programa.
STUDENT : LUCAS DE FREITAS LACERDA
DATE: 04/09/2024
TIME: 10:00
LOCAL: Google Meet, https://meet.google.com/eaz-qtrj-sxa
TITLE:

DEVELOPMENT OF A PIPELINE FOR RESDUCED SNP PANEL IDENTIFICATION FOR SPECEIS IDENTIFICATION TAKING INTO ACCOUNT HYBRIDIZATION

 


KEY WORDS:

Hybridization;Genetic Markers;Machine Learning;Annotation;Conservation


PAGES: 56
BIG AREA: Ciências Biológicas
AREA: Genética
SUMMARY:

The anthropogenic pressures experienced by the remnants of the Atlantic Forest on the northeastern coast of Brazil reflect in the conservation status of the animals that make up its fauna, including the Neotropical primates. Aiming to conserve the threatened primates of the Northeast, the National Center for Research and Conservation of Brazilian Primates, CPB/ICMBio, coordinates the National Action Plan for the Conservation of Northeast Primates (PAN-PRINE). One of the target species is the blonde capuchin monkey (Sapajus flavius), categorized as Endangered. In order to contribute to the implementation of the PAN-PRINE's actions, this study aimed to analyze the genetic structure of samples from both wild and captive individuals of the genus Sapajus and to propose a panel of genetic markers for differentiating two parental species and hybrids using machine learning techniques. Two population structure analyses were conducted: one exploratory analysis with various species of the genus and captive samples (n=228) and a specific analysis with captive samples and natural populations of S. flavius and S. libidinosus, including natural hybrids between these species. Our exploratory analysis removed eight captive samples from the dataset that did not exhibit the expected ancestry pattern for the hybridizing species of interest. From the remaining samples, 30 were classified as hybrids, 14 as S. libidinosus, and 8 as S. flavius based on the ancestry coefficients established to identify a species (Q>90%). These samples, and the wild ones, were partitioned into 20% for the validation dataset and 80% for the training and testing dataset. Six supervised learning algorithms were used to train predictive models: k- Nearest Neighbors (kNN), Decision Tree (DT), Naive Bayes (NVB), Support Vector Machine (SVM), X Gradient Boosting (XGB), and Random Forest (RF), followed by feature selection. All models were trained using data partitions with K-fold (K=5). Forward feature selection was used to select 15, 30, and 45 features. The RF, SVM, and NVB models consistently ranked highest as the number of features increased, based on the accuracy score in the validation dataset, with RF yielding the best results for the larger numbers of SNPs. When we ranked the SNP sets selected by the models, according to the best clustering generated by an unsupervised methodology, XGB and kNN emerged as the top models based on the Rand Score. None of our high-impact variants for group identification were located in coding regions of the genome; the majority were found in intergenic regions (n=20) and intronic regions that may belong to different gene splicing variants (n_vars=24, n_genes=119). From the initial set of 2484 SNPs, we drastically reduced the dimensionality of our data while maintaining highly informative variants for group differentiation. Moreover, we identified that most of these variants do not impact coding regions but are highly associated with species differentiation. These results are important for developing a product that can serve as a tool for conservation action plans for threatened species and management decisions considering the genetic profile of the populations and species studied for more effective conservation measures.

 

 


COMMITTEE MEMBERS:
Presidente - 3063244 - TETSU SAKAMOTO
Externa à Instituição - PATRICIA DOMINGUES DE FREITAS - UFSCAR
Externa à Instituição - THAIS GAUDENCIO DO REGO - UFPB
Notícia cadastrada em: 29/08/2024 15:40
SIGAA | Superintendência de Tecnologia da Informação - (84) 3342 2210 | Copyright © 2006-2025 - UFRN - sigaa03-producao.info.ufrn.br.sigaa03-producao