Dissertations/Thesis

Clique aqui para acessar os arquivos diretamente da Biblioteca Digital de Teses e Dissertações da UFRN

2024
Dissertations
1
  • DANIEL HENRIQUE FERREIRA GOMES
  • USE AND DEVELOPMENT OF COMPUTATIONAL METHODS TO SOLVE BIOLOGICAL PROBLEMS.

  • Advisor : Jorge Estefano de Santana Souza
  • COMMITTEE MEMBERS :
  • Jorge Estefano de Santana Souza
  • BEATRIZ STRANSKY FERREIRA
  • INACIO GOMES MEDEIROS
  • Data: Mar 28, 2024


  • Show Abstract
  • The explosion of genomic data in recent decades has presented a substantial challenge, requiring new approaches for efficient analysis and interpretation. This research emerges in this context, offering comprehensive bioinformatics analysis, exploring various facets of genomics and its relevance to health. The study encompasses the analysis of mitochondrial genomes of Amazonian species, the investigation of genetic variants and their correlation with the survival of gastric cancer patients in Natal-RN, and the development of the DTreePred application, designed to predict the pathogenicity of these variants. Additionally, the results of the analysis of gastric cancer patients in Belém-PA are discussed, employing machine learning for disease detection based on genetic variants. To validate the AI models developed based on the Pará population, public samples of Korean patients with and without gastric cancer were used. It is noteworthy that the most effective models achieved an accuracy of over 90% in classifying Korean patients as normal or cancer patients. This research thus highlights the productive integration of bioinformatics techniques in genomic research and the understanding of complex diseases, representing significant advances in the fields of health and genomics.

2
  • LUCAS DE FREITAS LACERDA
  • DEVELOPMENT OF A PIPELINE FOR ANALYSIS OF OPTIMIZED SNPs FOR IDENTIFICATION OF SPECIES AND THEIR HYBRIDS: A CASE STUDY IN Sapajus (Primates)

     

  • Advisor : TETSU SAKAMOTO
  • COMMITTEE MEMBERS :
  • TETSU SAKAMOTO
  • AMELY BRANQUINHO MARTINS
  • PATRICIA DOMINGUES DE FREITAS
  • THAIS GAUDENCIO DO REGO
  • Data: Sep 4, 2024


  • Show Abstract
  • The anthropogenic pressures experienced by the remnants of the Atlantic Forest on the northeastern coast of Brazil reflect in the conservation status of the animals that make up its fauna, including the Neotropical primates. Aiming to conserve the threatened primates of the Northeast, the National Center for Research and Conservation of Brazilian Primates, CPB/ICMBio, coordinates the National Action Plan for the Conservation of Northeast Primates (PAN-PRINE). One of the target species is the blonde capuchin monkey (Sapajus flavius), categorized as Endangered. In order to contribute to the implementation of the PAN-PRINE's actions, this study aimed to analyze the genetic structure of samples from both wild and captive individuals of the genus Sapajus and to propose a panel of genetic markers for differentiating two parental species and hybrids using machine learning techniques. Two population structure analyses were conducted: one exploratory analysis with various species of the genus and captive samples (n=228) and a specific analysis with captive samples and natural populations of S. flavius and S. libidinosus, including natural hybrids between these species. Our exploratory analysis removed eight captive samples from the dataset that did not exhibit the expected ancestry pattern for the hybridizing species of interest. From the remaining samples, 30 were classified as hybrids, 14 as S. libidinosus, and 8 as S. flavius based on the ancestry coefficients established to identify a species (Q>90%). These samples, and the wild ones, were partitioned into 20% for the validation dataset and 80% for the training and testing dataset. Six supervised learning algorithms were used to train predictive models: k- Nearest Neighbors (kNN), Decision Tree (DT), Naive Bayes (NVB), Support Vector Machine (SVM), X Gradient Boosting (XGB), and Random Forest (RF), followed by feature selection. All models were trained using data partitions with K-fold (K=5). Forward feature selection was used to select 15, 30, and 45 features. The RF, SVM, and NVB models consistently ranked highest as the number of features increased, based on the accuracy score in the validation dataset, with RF yielding the best results for the larger numbers of SNPs. When we ranked the SNP sets selected by the models, according to the best clustering generated by an unsupervised methodology, XGB and kNN emerged as the top models based on the Rand Score. None of our high-impact variants for group identification were located in coding regions of the genome; the majority were found in intergenic regions (n=20) and intronic regions that may belong to different gene splicing variants (n_vars=24, n_genes=119). From the initial set of 2484 SNPs, we drastically reduced the dimensionality of our data while maintaining highly informative variants for group differentiation. Moreover, we identified that most of these variants do not impact coding regions but are highly associated with species differentiation. These results are important for developing a product that can serve as a tool for conservation action plans for threatened species and management decisions considering the genetic profile of the populations and species studied for more effective conservation measures.

     

     

Thesis
1
  • LUKAS IOHAN DA CRUZ CARVALHO
  • EVALUATION OF A NEW NEURONAL INDUCTION PROTOCOL USING SINGLE-CELL RNA-SEQUENCING AND MACHINE LEARNING

  • Advisor : MARCOS ROMUALDO COSTA
  • COMMITTEE MEMBERS :
  • CECÍLIA HEDIN-PEREIRA
  • MARCOS ROMUALDO COSTA
  • MYCHAEL VINÍCIUS DA COSTA LOURENÇO
  • RODRIGO JULIANI SIQUEIRA DALMOLIN
  • TARCISO ANDRE FERREIRA VELHO
  • Data: Feb 26, 2024


  • Show Abstract
  • Cell type identification is a critical step in the computational analysis of scRNA-Seq experiments, involving the unsupervised grouping of cells based on gene expression profiles. Traditional methods relying on canonical gene markers exhibit limitations, such as sensitivity to variations and the absence of characteristic genes for certain cell types. To address these challenges, we propose a novel approach combining machine learning algorithms with feature selection. Our methodology involves selecting a dataset suitable for training a model to ensure generalization to new data. We chose a comprehensive dataset encompassing the central and peripheral nervous system from mice at different developmental stages. Subsequently, feature selection was applied using the DUBStepR algorithm, considering gene-gene correlations to identify optimal features for cell classification. The resulting dataset, composed of 28,795 cells and 16,960 genes, was used to train and evaluate models employing k Nearest Neighborhood (kNN), Decision Tree (DT), Naive Bayes (NB), Support Vector Machine (SVM) and Multilayer Perceptron (MLP) algorithms. All models demonstrated F1-scores exceeding 90%, except for NB. Testing on a human brain scRNA-Seq dataset confirmed the robustness of the algorithms, with area under curve (AUC) values indicating accurate cell classification. SVM and MLP were selected for further analysis due to lower false positive and false negative rates. Comparisons with existing tools such as scAnnotatR and ACTINN highlight the versatility of our approach, particularly when dealing with diverse cell types. Next, we applied the SVM and MLP models to classify neurons generated in vitro human-induced neurons (hiNs) generated using distinct protocols, achieving consistent results in identifying glutamatergic and GABAergic neurons. We also attempted to classify hiNs according to cells of different brain regions, revealing challenges in classifying GABAergic neurons by region, possibly due to a limited number of optimal features. Gene expression analysis and Gene Set Enrichment Analysis (GSEA) contributed to identify gene sets associated with the electrophysiological maturation of glutamatergic hiNs generated through an alternative protocol using ASCL1 compared to other protocols. Regulatory network analysis identified master transcription factors with higher activity specifically in this protocol. In conclusion, our integrated approach of feature selection and machine learning algorithms offers an alternative way of identifying cell groups based on gene expression profiles, enhancing the refinement of single-cell analysis in the context of differential gene expression, GSEA, and regulatory gene networks.

2
  • LUCAS FELIPE DA SILVA
  • BIOINFORMATICS APPROACHES APPLIED IN THE ANALYSIS OF DATA GENERATED BY ABIOTIC STRESS: MICROGRAVITY AND BY HYDROGEN PEROXIDE IN SUGARCANE PLANTS
  • Advisor : KATIA CASTANHO SCORTECCI
  • COMMITTEE MEMBERS :
  • KATIA CASTANHO SCORTECCI
  • BEATRIZ STRANSKY FERREIRA
  • ADRIANA FERREIRA UCHOA
  • FATIMA CERQUEIRA ALVIM
  • TERCILIO CALSA JUNIOR
  • Data: Mar 7, 2024


  • Show Abstract
  • Sugarcane (Saccharum spp.) is a monocotyledonous plant of the Poaceae family, a C4 plant adapted to tropical and subtropical environmental. And Brazil is the world's largest producer. Plants can be subject to various biotic and abiotic factors that may induce oxidative stress. This stress is associated with an imbalance in the homeostasis between the production and degradation of Reactive Oxygen Species (ROS), conditions that can affect their development. Hydrogen peroxide (H2O2) acts as a signaling molecule in response to various cellular stimuli in plants. Therefore, this thesis was divided into two chapters. In the first chapter, bioinformatics tools were used to understand how changes in the gravitational field can trigger responses like oxidative stress in sugarcane plants, based on messenger RNA sequencing data. In the second chapter, oxidative stress was induced by exogenous application of H2O2 (0 mM, 10 mM, 20 mM, and 30 mM) for 8 hours at a temperature of 25-27 °C in sugarcane plants. Bioinformatic analyses were then conducted on proteomic data obtained from the roots and leaves of the treated material. The aim of this work was to identify, in both chapters, genes/proteins with differential expression in roots and leaves under microgravity conditions, as well as in response to different concentrations of H2O2. To achieve this purpose, in both approaches, the species Sorghum bicolor, Zea mays, and Oryza sativa subs. japonica were used as references. Bioinformatics analysis results revealed unique and specific genes in each of the nine analyzed data libraries, highlighting genes such as C5WVD4 and C5YLK6, associated with isoleucine synthesis and NADPH, respectively, in response to microgravity, and genes with altered expression at different concentrations of H2O2, such as C5XFH6 and B4G143, associated with NADPH supply and photosynthesis in the positive regulation of ROS, respectively. Enriched metabolic pathways in response to microgravity and H2O2, including Selenocompound metabolism, Photosynthesis - antenna proteins, and Pentose phosphate pathway, were also identified. Through this multidisciplinary study, which combines histology, biochemistry, RNA-seq analysis, and proteomics, there is a comprehensive understanding of the effects of microgravity and H2O2 on sugarcane, highlighting changes in tissue structural organization, lignin accumulation, H2O2, and ROS. Therefore, this work assisted in identifying unique and specific genes/proteins expressed in each tissue and the activated metabolic pathways in leaves and roots, elucidating the diverse responses of sugarcane plants under altered gravity conditions with the VSB-30 sounding rocket flight and exposure to different concentrations of H2O2. It reveals a complex network of genes and metabolic pathways that act in response to oxidative stress conditions, triggering defence and tolerance mechanisms. The data obtained advance the understanding of how plants respond to each of the analyzed adverse conditions, employing specific adaptive strategies. Additionally, they emphasize the importance of H2O2 in adaptive and survival responses, as well as the versatility of the abscisic acid (ABA) phytohormone in signaling between roots and leaves. These findings provide valuable insights for the development of genetic improvement strategies and optimized cultivation practices for plant performance under variable conditions.

3
  • LEONARDO RENE DOS SANTOS CAMPOS
  • Large-Scale Inference of Evolutionary Roots of Orthologous Genes with the Bridge Algorithm

  • Advisor : RODRIGO JULIANI SIQUEIRA DALMOLIN
  • COMMITTEE MEMBERS :
  • CESAR RENNO COSTA
  • EDUARDO BOUTH SEQUERRA
  • RODRIGO JULIANI SIQUEIRA DALMOLIN
  • SÁVIO TORRES DE FARIAS
  • WILFREDO BLANCO FIGUEROLA
  • Data: Mar 8, 2024


  • Show Abstract
  • Methods for reconstructing evolutionary scenarios are important tools that help to better understand biological systems under the perspective of its origins and evolution. The primary concept for understanding them lies in the relationships established by comparing genomes from different species to form gene families known as Orthologous Groups (OGs). Orthologs are genes from distinct species derived from their last common ancestor (LCA), tipically having similar functions in each organism. By observing the phyletic pattern of an OG in a species tree, it is possible to calculate the LCA where most probably the trait represented by the OG emerged. Although this process can be trivial when applied to a single gene, it remains chalenging for large-scale queries. The bridge algorithm, structured as a R software package, allows to interrogate several hundreds to thousands of OGs at once, assigining evolutionary roots to each OG. This thesis constitutes a comprehensive reference to the method of rooting orthologous genes employed by the bridge algorithm, presenting detailed logic, implementation, accuracy, and performance.

4
  • FÁBIO FONSECA DE OLIVEIRA
  • Proposed FPGA-Based Hardware Architectures for Acceleration of Smith-Waterman and K-Mers Algorithms

  • Advisor : MARCELO AUGUSTO COSTA FERNANDES
  • COMMITTEE MEMBERS :
  • MARCELO AUGUSTO COSTA FERNANDES
  • RENAN CIPRIANO MOIOLI
  • DANIEL SABINO AMORIM DE ARAUJO
  • CARLOS ALBERTO VALDERRAMA SAKUYAMA
  • LUCILEIDE MEDEIROS DANTAS DA SILVA
  • Data: Apr 5, 2024


  • Show Abstract
  • In this work, we address the growing challenge of efficiently processing the vast and continuously expanding volume of data in biological databases. The need for fast and accurate sequence analysis techniques is more pressing than ever, given the importance of identifying similarities between biological sequences for applications in genomics, taxonomy, and beyond. Central to this effort is optimizing sequence alignment algorithms, particularly the Smith-Waterman (SW), a high-precision method based on dynamic programming, and K-Mers, a technique for counting subsequences fundamental in genomic analysis. We propose an innovative parallel hardware architecture for the SW algorithm, incorporating a systolic array structure that significantly accelerates the forward and backward phases of alignment. This architecture pre-organizes the alignment in the forward stage, reducing the complexity of the subsequent backtracking initiated from the maximum score position. Validated on Field-Programmable Gate Array (FPGA), the architecture achieved a rate of up to 79.5 Giga Cell Updates per Second (GCPUS), demonstrating a notable advancement in processing efficiency. Additionally, we developed a K-Mers based algorithm focused on the exact extraction of short subsequences, characterized by its low memory consumption, feasibility of execution time, high parallelization capability, and energy efficiency. Primarily intended for use in FPGA, the algorithm is also adaptable to other hardware platforms. These contributions not only set new standards in speed and efficiency for the processing of biological data but also pave the way for significant advances in genomic and taxonomic research, among other areas of bioinformatics.

5
  • GABRIEL BEZERRA MOTTA CÂMARA
  • Advanced Convolutional Neural Network Techniques for Classification of SARS-CoV-2 Variants and Other Viruses: A Study Using k-mers and Chaos Game Representation

  • Advisor : MARCELO AUGUSTO COSTA FERNANDES
  • COMMITTEE MEMBERS :
  • TÚLIO DE LIMA CAMPOS
  • GUILHERME DE ALENCAR BARRETO
  • IVANOVITCH MEDEIROS DANTAS DA SILVA
  • MARCELO AUGUSTO COSTA FERNANDES
  • PATRICK CESAR ALVES TERREMATTE
  • Data: Sep 5, 2024


  • Show Abstract
  • Since December 2019, the global impact of the COVID-19 pandemic, caused by the SARS-CoV-2 virus, has been profound. Early identification of the virus’s taxonomic classification and genomic origin is critical for strategic planning, containment, and treatment. Deep learning techniques have proven successful in addressing various viral classification challenges, including diagnosis, metagenomics, phylogenetics, and genomic analysis. Motivated by these advances, this study introduces an effective viral genome classifier for SARS-CoV-2, utilizing a convolutional neural network (CNN) framework. This research employed image representations of complete genome sequences to train the CNN, leveraging two distinct datasets: one based on k-mer image representation and the other on Chaos Game Representation (CGR). The k-mer dataset was used for taxonomic classification experiments of the SARS-CoV-2 virus, while the CGR dataset focused on classifying variants of concern (VOC) of SARS-CoV-2. The CNN achieved remarkable performance in taxonomic classification, with accuracy rates ranging from 92% to 100% on the validation set and between 98.9% and 100% on the test set containing SARS-CoV-2 samples. These results demonstrate the model’s adaptability for classifying other emerging viruses. For the classification of SARS-CoV-2 variants using CGR images, the CNN delivered even higher accuracy, reaching 99.9% on the validation set and 99.8% on the test set. The findings underscore the applicability of deep learning techniques in genome classification tasks, providing a robust tool for the early detection and classification of viral threats. The integration of CNNs with k-mer and CGR image representations presents a novel and effective method for viral genome analysis, supporting ongoing efforts in virology and public health.

2023
Dissertations
1
  • DOUGLAS FELIPE DE LIMA SILVA
  • Genomic analysis of petroleum hydrocarbon degrading microorganisms and their potential performance on priority hydrocarbons

  • Advisor : LUCYMARA FASSARELLA AGNEZ LIMA
  • COMMITTEE MEMBERS :
  • LUCYMARA FASSARELLA AGNEZ LIMA
  • RODRIGO JULIANI SIQUEIRA DALMOLIN
  • ANA TEREZA RIBEIRO DE VASCONCELOS
  • Data: Feb 28, 2023


  • Show Abstract
  • Contamination of soils and marine ecosystems by hydrocarbons that constitute petroleum from large and small oil spills throughout its supply chain brings serious consequences to the environment. Among the existing strategies to mitigate environmental impacts in affected areas, bioremediation by bioaugmentation using organisms capable of degrading oil is an alternative that offers a better cost-benefit  ratio and promotes greater removal of compounds when compared to physical-chemical methods. National and international environmental regulatory agencies list 179 compounds as priority for bioremediation due to their toxic and/or mutagenic potential. From previous works, the members of the research group of the Laboratory of Molecular Biology and Genomics have been obtaining bacterial isolates from samples of environments contaminated by oil, maintaining a stock of these isolates that compose a bank of microorganisms preserved by the laboratory. The genomes of isolates with promising profile to act in bioremediation are being sequenced, in an attempt to identify their taxonomic and metabolic profile. So far, through the sequencing of the complete genome of 22 bacterial isolates previously obtained by the group and sequencing of the 16S gene of 18 isolates obtained from oil samples collected on beaches on Rio Grande do Norte in the development of this work, resulted in the identification of 10 genera of bacteria able to grow using oil as a carbon source. The analysis of the generated data, using the R programming language, allowed the comparison with their respective reference genomes, determining their relationships and particularities. It was identified among all isolates with complete genome sequenced 53 genes that encode enzymes, present in 20 pathways of degradation and metabolism of xenobiotics from KEGG, which participate in the degradation process of 37 hydrocarbons reported as priority, as well as the similarities of the degradation profile of the isolates. Through in silico analysis, a consortium of 4 isolates was proposed with potential to act in bioremediation of 34 of the 37 compounds.

2
  • LEONARDO CABRAL AFONSO FERREIRA
  • Structure and diversity of the rfb locus in bacteria of the genus Leptospira and its association with serological classification

  • Advisor : TETSU SAKAMOTO
  • COMMITTEE MEMBERS :
  • GUSTAVO ANTONIO DE SOUZA
  • Maria Raquel Venturim Cosate
  • TETSU SAKAMOTO
  • Data: Mar 24, 2023


  • Show Abstract
  • Leptospirosis is considered a globally important zoonosis due to its widespread distribution and virulence, affecting both humans and commercially important animals. It is caused by pathogenic bacteria of the genus Leptospira and phylum Spirochaetes, and contamination occurs through direct or indirect contact with the contaminant agent present in the environment, such as urine from infected animals or contaminated water and soil. The genus has 68 species that can be grouped into two major groups according to their lifestyle: pathogenic and saprophytic. In addition to taxonomic classification, samples of these genera can be classified based on their antigenic characteristics into serogroups and serovars. Serological classification is of great relevance in the fields of epidemiology and clinical analysis, but the methods used for this classification are laborious, require infrastructure and specialized labor, and take days to obtain results. In this study, we aimed to find genetic patterns associated with the serological classification of Leptospira bacteria by analyzing the genetic composition of the rfb locus and proposing methods that allow for the classification of Leptospira samples at the serogroup level. To do this, we used genomic data from 68 species classified into 27 serogroups, which are distributed in 722 samples available in public databases. We identified the genes that are part of the rfb locus through orthologous groups in samples that contained the intact rfb locus in a single contig. We used a hierarchical clustering method to group samples with similar genetic profiles of the rfb locus. This analysis made it possible to contemplate the diversity of the genetic composition profile of the rfb locus in the genus Leptospira and to observe correspondence between serogroup classification and the groups formed by hierarchical clustering. The generated clustering suggests the classification of samples into six large classes that, in addition to presenting serological affinity, share similarities in the genetic composition of the rfb locus. It was observed that samples of the same serogroup share similarities in the genetic composition of the rfb locus. Additionally, it was possible to verify the existence of different gene blocks that may be conserved in samples belonging to different species and serogroups. It is presumed that different combinations of these gene blocks result in the synthesis of different O-antigen structures of lipopolysaccharides and consequently different serogroups. This study allows for the suggestion of molecular markers that allow for the use of molecular strategies for the serological identification of Leptospira.

3
  • EPITÁCIO DANTAS DE FARIAS FILHO
  • Transcriptional signature of clear cell renal cell carcinoma based on competitive endogenous RNA

  • Advisor : BEATRIZ STRANSKY FERREIRA
  • COMMITTEE MEMBERS :
  • ALEXANDRE ROSSI PASCHOAL
  • BEATRIZ STRANSKY FERREIRA
  • PATRICK CESAR ALVES TERREMATTE
  • RODRIGO JULIANI SIQUEIRA DALMOLIN
  • Data: Aug 15, 2023


  • Show Abstract
  • Renal carcinoma, as it is a pathology of silent and multifactorial development, is characterized by a high rate of patients with metastases. After several studies have elucidated the activity of coding genes in the metastatic development of renal carcinoma, new studies seek to evaluate the association of non-coding genes, such as competitive endogenous RNA (ceRNA), with the metastatic process. Thus, the aim of this study is to build a transcriptional signature for clear cell renal cell carcinoma (ccRCC) associated with metastatic development from a ceRNA network and to analyze the probable biological functions performed by the participants of the signature. Using ccRCC data from The Cancer Genome Atlas (TCGA), we constructed nine transcriptional signatures from eight feature selection techniques and analyzed the sensitivity and specificity of prediction of regression models in the benchmarking process. Consequently, signature genes were obtained and analyzes of somatic and copy number changes, risk analysis for survival and metastatic progression, and functional enrichment analyzes were performed. In this study we present a transcriptional signature of 10 genes, composed of 2 long non-coding RNAs, SNHG15 and AF117829.1, 2 miRNAs, hsa-miR-130a-3p and hsa-mir-381-3p, and 7 mRNAs, BTBD11, INSR, HECW2, RFLNB, PTTG1, HMMR, and RASD1. Validation using the external dataset of the International Cancer Genome Consortium (ICGC) made it possible to assess the generalization of the signature, which showed an accuracy of 72% and an area under the curve of 81.5%. Genomic analyzes identified that the signature participants are located on chromosomes with highly mutated regions (G-index > 2). The hsa-miR-130a-3p genes, AF117829.1 and HECW2, had a significant relationship between expression and patient survival, and the last two have a significant relationship with metastatic development. In addition, functional enrichment was seen in important pathways for tumor development, such as: PI3K/AKT, TNF, FoxO, RNA polymerase 2 transcription regulation, cell control, and others. Finally, by analyzing the connections of the signature genes within the ceRNA network in conjunction with studies in the literature, it was possible to obtain an overview of the activities performed by them within the ccRCC. Therefore, this transcriptional signature can identify non-coding genes as potential biomarkers to be used for a better understanding of renal carcinoma, as well as in the development of future treatments in the clinical area.

4
  • GUSTAVO LOVATTO MICHAELSEN
  • Construction and Validation of a Prognostic Model Integrating Gene Expression and DNA Methylation Data in Medulloblastoma

  • Advisor : MARIALVA SINIGAGLIA
  • COMMITTEE MEMBERS :
  • MARIALVA SINIGAGLIA
  • BEATRIZ STRANSKY FERREIRA
  • CAROLINA NOR
  • Data: Sep 14, 2023


  • Show Abstract
  • Medulloblastoma (MB) is one of the most common pediatric brain tumors and it is estimated that one-third of patients will die from the disease. The lack of accurateprognostic biomarkers is a major challenge for the clinical improvement of thosepatients, with conventional prognostic parameters having limited and unreliable correlations with the disease outcome. Acknowledging this issue, our aim was to build a gene signature and evaluate its potential as a new prognostic model for patients with the disease. Hypermethylation of tumor suppressor genes and hypomethylation of oncogenes are methylation dysregulations crucial for cancer tumorigenesis and tumor maintenance, and it is no exception for MB. In this study, we used six datasets totaling 1679 MB samples, including RNA gene expression and DNA methylation data from primary MB as well as control samples from healthy cerebellum. We identified methylation-driven genes (MDGs) in MB, genes whose expression is correlated with their methylation and which are also differentially methylated in relation to healthy tissue. After, LASSO regression, a supervised machine learning statistical method, was used with the MDGs as a parameter resulting in a two-gene signature (GS-2) of candidate prognostic biomarkers for MB (CEMIP and  NCBP3). Using a risk score model, we confirmed the GS-2 impact on overall survival (OS) with Kaplan-Meier analysis (log-rank p < 0.01). We evaluated its robustness and accuracy with receiver operating characteristic (ROC) curves predicting OS at 1, 3 and 5 years in multiple datasets (training set: 77.2%, 73.2% and 71.2%, mean in three validation sets: 83.6%, 77.6%, 75.4% at 1, 3 and 5 years respectively). We evaluated GS-2 as an independent prognostic biomarker with multivariable Cox regression which showed p-value < 0.01 in all four datasets evaluated. The methylation-regulated GS-2 risk score model can effectively classify patients with MB into high and low-risk, reinforcing the importance of this epigenetic modification in the disease. Such genes stand out as promising prognostic biomarkers with potential application for MB treatment.

5
  • RUTH FLÁVIA BARROS SETÚBAL
  • Phylogenetic analysis of rfb locus genes of the Leptospira genus of the Sejroe, Mini and Hebdomadis serogroups

  • Advisor : Jorge Estefano de Santana Souza
  • COMMITTEE MEMBERS :
  • Jorge Estefano de Santana Souza
  • Maria Raquel Venturim Cosate
  • TETSU SAKAMOTO
  • Data: Sep 29, 2023


  • Show Abstract
  • Leptospirosis is a zoonosis with a major impact on public health, as it is considered a notifiable disease and occurs mainly in tropical regions with poor sanitation and vulnerable socio-economic conditions. It is caused by bacteria of the genus Leptospira and phylum Spirochaetes and contamination occurs through direct or indirect contact with the contaminating agent. In addition to taxonomic classification, which is carried out through sequencing and the analysis of some marker genes, such as 16S rRNA and secY, they are usually classified based on their antigenic characteristics into serogroups and serovars. This type of classification is widely used in epidemiological studies and vaccine development. Despite its importance, few studies have been carried out to understand the evolutionary dynamics of the emergence or change of serology in this genus. In view of this, in this study we applied molecular phylogeny methods in order to understand the evolutionary processes involving the genus' serology. To this end, gene sequences that are part of the rfb locus from samples of the Sejroe, Mini and Hebdomadis serogroups (34 samples) were extracted and submitted to the phylogenetic pipeline, resulting in the inference of 75 maximum likelihood trees. Analyzing the trees, it can be seen that those genes from the rfb locus found in the majority of Leptospira species presented a topology similar to that of the species tree. On the other hand, those genes found in the variable region of the locus showed trees with topologies that suggest the occurrence of lateral transfer between the species L. borgpetersenii and L. kirschneri and L. interrogans and L. weilli. The study suggests a new interpretation of the evolutionary history of the rfb locus genes and the evolutionary dynamics of serogroup changes.

6
  • HELMUT KENNEDY AZEVEDO DO PATROCÍNIO
  • In silico investigation of nervous system protein peptides as candidates for molecular mimicry in Guillain-Barré syndrome and multiple sclerosis triggered by the Epstein-Barr virus

  • Advisor : JOAO PAULO MATOS SANTOS LIMA
  • COMMITTEE MEMBERS :
  • JOAO FIRMINO RODRIGUES NETO
  • JOAO PAULO MATOS SANTOS LIMA
  • JÉSSIKA DE OLIVEIRA VIANA
  • ÂNDREA KELY CAMPOS RIBEIRO DOS SANTOS
  • Data: Oct 20, 2023


  • Show Abstract
  • Guillain-Barré Syndrome (GBS) and multiple sclerosis are autoimmune diseases associated with an immune response against peripheral (PNS) and central nervous system (CNS) autoantigens, respectively. Most studies on GBS immunopathology investigate the cross-reactivity between myelin sheath ganglioside antigens and carbohydrates from Campylobacter jejuni bacteria. However, GBS has a spectrum of subtypes and, particularly, the Acute Inflammatory Demyelinating Polyradiculoneuropathy (AIDP) form has little evidence of a relationship with C. jejuni or of autoimmunity against gangliosides. The immunopathology of multiple sclerosis is better understood, with several protein autoantigens reported in the literature. In the present work, we screened the databases “The Human Protein Atlas,” AFND, and IEDB to select abundant proteins from the human nervous system (HNS), immunogenic proteins of the Epstein-Barr virus, and HLA haplotypes, respectively. Then we constructed a pipeline with several open-source computational tools to predict HLA binding to peptides and cytokine production. The following analysis used ten proteins from the HNS and 28 from EBV to predict the binding peptides of 21 common HLAs in the world population. From the search for haplotypes in the AFND, we found 1359 registered haplotypes distributed among 51 pairs of HLAs. After that, our pipeline compared nonapeptide anchors of EBV and myelin proteins for identity at critical residues for interaction with the T-cell receptor (TCR), establishing three selection criteria according to the relevance of each contact for TCR-peptide-MHC interaction. According to these criteria, all nervous system proteins presented peptides with relevant identity with EBV peptides. The prediction of IL-4 or IFN-γ cytokine stimulation allowed the discovery of which pairs of similar nonamers can induce the activation of Th1 or Th2 cells and perhaps cause autoimmunity through molecular mimicry. Seven proteins (APLP1, CNP, GlialCam, MAG, MBP, Periaxin, and PLP) presented pairs of similar peptide stimulators of cytokines IL-4, IFN-γ, or both. The P0 protein also presented pairs capable of inducing IL-4 or IFN-γ, though restricted to one or few HLAs or haplotypes. Given the high number of possible peptides that can cause molecular mimicry, our results align with the hypothesis that multiple antigens can cause immunity in multiple sclerosis and GBS. The nonamer pairs found here support further experimental investigations of these autoantigens and contribute to a better understanding of both pathologies.

Thesis
1
  • DHIEGO SOUTO ANDRADE
  • TOWARDS ENHANCED PREDICTABILITY IN IMMUNOTHERAPY FOR CANCER THROUGH MACHINE LEARNING: A ROADMAP FOR BUILDING PREDICTIVE MODELS FROM THE T CELL RECEPTOR REPERTOIRE FEATURE ANALYSIS

  • Advisor : CESAR RENNO COSTA
  • COMMITTEE MEMBERS :
  • SOL EFRONI
  • CESAR RENNO COSTA
  • RENAN CIPRIANO MOIOLI
  • RODRIGO JULIANI SIQUEIRA DALMOLIN
  • WILFREDO BLANCO FIGUEROLA
  • Data: Mar 28, 2023


  • Show Abstract
  • Although cancer therapy provides a vast repertoire of medicines and treatments, many cancers develop ways to escape and continue to proliferate. Immunotherapy, in particular, has proved efficient in destroying some types of cancers, but it is not an infallible option. Predicting the efficiency of each treatment option would be a valuable tool for the decisionmaking process in clinical practice. Immunotherapy enhances the patient’s T cells to attack cancer cells. T cells use a receptor protein from their surface to identify possible targets, such as cancer cells. The advent of NGS (Next Generation Sequencing) brought considerable speed to sequencing large amounts of genetic material, such as TCR (T Cell Receptor). The diversity of receptors is colossal, and understanding these highly complex repertoires might be the key to deciphering the immune system’s behavior. Here, we evaluated the process of extracting meaningful features of TCR repertoire data to build predictive models to distinguish healthy controls from cancer patients or patients treated with different drugs. In light of that, it is essential to develop tools that can easily and quickly generate insights from TCR repertoire data to predict future outcomes. We developed a bioinformatic tool called GENTLE (GENerator of T cell receptor repertoire features for machine LEarning), geared towards any researcher working with TCR repertoire data that aims to explore these data and build prediction tools. GENTLE is open-source, has a web platform, can be installed locally, implements many diversity metrics, builds networks using the Levenshtein distance, calculates the frequency of motifs, transforms the data with dimensional reduction methods, implements normalization methods, performs feature selection, builds, evaluates, and deploys classifiers. Using this tool, one can glean great insights from TCR repertoire data.

     

2
  • JÉSSIKA DE OLIVEIRA VIANA
  • In silico design, synthesis and activity of spiro-acridine derivatives.

  • Advisor : EUZEBIO GUIMARAES BARBOSA
  • COMMITTEE MEMBERS :
  • IGOR JOSÉ DOS SANTOS NASCIMENTO
  • EDILSON BESERRA DE ALENCAR FILHO
  • EUZEBIO GUIMARAES BARBOSA
  • JOAO PAULO MATOS SANTOS LIMA
  • MARCELO DE SOUSA DA SILVA
  • Data: Jun 16, 2023


  • Show Abstract
  • Bioactive compounds have been studied in order to offer better efficacy and selectivity against various diseases, representing a promising scenario in drug development. Recently, a series of acridinic derivatives was synthesized and exhibited antileishmanial and anticancer activity. However, the concept of "one target, one drug, one disease" is not always true, as compounds with previously described therapeutic applications can act on more than one target. Based on this, this work aimed to identify, through reverse virtual screening based on the receptor, the probable mechanism of action of spiro-acridinic derivatives. Additionally, the mechanism of action was confirmed through in vitro enzymatic assays. Using these approaches, Chapter I of this work presents the identification, through computational methodologies, of the pteridine reductase 1 (PTR1) enzyme of L. major as a potential target for spiro-acridinic compounds. Additionally, we found the chitinase B1 (CHIB1) enzyme of Aspergillus fumigatus as a potential target against Aspergillosis. For PTR1, docking and molecular dynamics assays presented the high stability of compound 1 in the active site of the enzyme. For CHIB1, other derivatives were subjected to molecular docking and molecular dynamics, identifying 3 compounds with the best profile for the target. In Chapter II, in vitro assays were performed to experimentally confirm the action of spiro-acridinic derivatives on the studied enzymes. For PTR1, in vitro assays demonstrated a KD of 33.1 μM for the best compound, while for chitinase, the best compound showed an IC50 of 0.6 ng/μL. Therefore, this work demonstrated the high efficiency of reverse virtual screening as a target prediction approach. Additionally, the program allowed for characterizing its potency, inhibition modality, and interaction profile with its therapeutic target. Thus, spiro-acridinic derivatives can act as multi-target inhibitors of Leishmania's PTR1 and fungal chitinase.

2022
Dissertations
1
  • MARIA JULIA PEREIRA DAVI
  • Design and in silico validation of polymerase chain reaction primers to detect severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)

  • Advisor : DANIEL CARLOS FERREIRA LANZA
  • COMMITTEE MEMBERS :
  • DANIEL CARLOS FERREIRA LANZA
  • RODRIGO JULIANI SIQUEIRA DALMOLIN
  • TAFFAREL MELO TORRES
  • Data: Apr 6, 2022


  • Show Abstract
  • The design of polymerase chain reaction (PCR) primers that target conserved segments of viral genomes is important to prevent false-negative results and reduce the need to standardize different PCR protocols for the same target. In this work, we designed and described a set of primers and probes that target conserved regions identified from multiple alignment of 2,341 SARS-CoV-2 genomes available in the GISAID (Global Initiative on Sharing All Influenza Data) database. Subsequently, the primers were validated together with the probes on 211,833 sequences from the entire genomes of SARS-CoV-2. Nine systems were obtained (primer forward+reverse+probes) that potentially anneal to the highly conserved regions of the virus genome identified in this analysis. In silico predictions also demonstrated that the primers do not interact with non-specific targets in sequences from humans, bacteria, fungi, Apicomplexa and other betacoronaviruses and less pathogenic coronavirus strains. The publication of these primer and probes sequences will make it possible to validate more efficient protocols for identifying SARS-CoV-2.

2
  • MATHEUS GIBEKE SIQUEIRA DALMOLIN
  • Systems biology-based analysis highlights altered processes that impact overall survival of Ewing Sarcoma patients

  • Advisor : MARIALVA SINIGAGLIA
  • COMMITTEE MEMBERS :
  • MARIALVA SINIGAGLIA
  • RITA MARIA CUNHA DE ALMEIDA
  • LAURO JOSÉ GREGIANIN
  • Data: Apr 6, 2022


  • Show Abstract
  • Ewing’s Sarcoma (ES) is a highly aggressive disease and the second most frequent pediatric bone neoplasm. The ES hallmark is the presence of the aberrant transcription fator EWSR1-FLI that drives metabolic reprogramming in ES. The ES survival rate has increased at the cost of high toxicity that limits survival rates and causes significant morbidity. Therefore it is crucial to identify and obtain a complete understanding of the pathways that impact ES survival for development of novel diagnostics and therapeutic strategies. Here, we identified differences at the transitional level between ES patients with short-term survivors (STS) and long-term survivors (LTS) based on transcriptional data available in three public datasets, applying the transcriptogram analysis. Three differentially expressed clusters commons across the cohorts analyzed were identified. Processes related to DNA damage response and repair, immune response, apoptosis and autophagy were dysregulated between the STS and LTS groups. Furthermore, the functional enrichment of the common genes between three clusters and ES regulons highlight the upregulation of the Hippo pathway in STS patients. Our analysis suggests that different processes may be guiding the outcome of ES patients in an integrated way and may contribute to the diversity of phenotypes driven by the EWSR1-FLI1 expression fluctuation.

3
  • DÉBORA VIRGÍNIA DA COSTA E LIMA
  • The Use of Artificial Neural Networks in the Analysis of Lung Cancer Data

  • Advisor : ADRIAO DUARTE DORIA NETO
  • COMMITTEE MEMBERS :
  • ADRIAO DUARTE DORIA NETO
  • BEATRIZ STRANSKY FERREIRA
  • TAFFAREL MELO TORRES
  • TETSU SAKAMOTO
  • Data: May 12, 2022


  • Show Abstract
  • Lung cancer represents the leading cause of cancer death worldwide and has a high incidence. Like other types of cancer, it can occur due to different causes, from genetics to environmental ones, so studies carried out using different types of data may be relevant for the control of this neoplasm, especially when considering factors that have an impact on patient survival. In the context of lung cancer, this study uses deep learning to predict patient survival. Clinical and molecular data from TCGA (The Cancer Genome Atlas) databases were obtained for the LUSC (Lung Squamous Cell Carcinoma) and LUAD (Lung Adenocarcinoma) cohorts, followed by the analysis of the genomic alterations, and application of neural networks using as input the frequently mutated genes for each cohort, selection of key genes and validation with another database. The cohorts showed differences in survival among themselves when subjected to the Kaplan-Meier method and the Log-Rank test. In the genomic analysis, all genes with a mutation frequency above 15% were selected, and 34 genes were found for LUAD and 32 for LUSC. The use of these genes as input in the constructed networks made it possible to generate the LUSC and LUAD networks with 100% accuracy, identifying, according to the mutations, whether the patient was alive or dead. In addition, a LUSC network was also obtained using another LUSC-KR database as validation, which reached 99% accuracy. In this way, this work showed that the use of genes with frequent mutations associated with deep learning is a robust tool and allows predicting the survival of patients with lung cancer.

     
4
  • BIANCA CRISTIANE FERREIRA SANTIAGO
  • METAGENOMIC ANALYSES REVEAL THE INFLUENCE OF DEPTH LAYERS ON MARINE BIODIVERSITY IN TROPICAL AND SUBTROPICAL REGIONS

  • Advisor : RODRIGO JULIANI SIQUEIRA DALMOLIN
  • COMMITTEE MEMBERS :
  • RODRIGO JULIANI SIQUEIRA DALMOLIN
  • CESAR RENNO COSTA
  • GABRIEL DA LUZ WALLAU
  • Data: Oct 28, 2022


  • Show Abstract
  • About 71% of the Earth's surface ecosystem is covered by ocean, responsible for containing 97% of all the Earth's water, with plankton being its dominant life form. Microorganisms play an essential role in maintaining the planet's system, since they are sources of energy and nutrients for living beings, realize almost half of net primary production, maintain chemical balance in the atmosphere and export photosynthetically fixed carbon to the deepest layers of the ocean. Ocean ecosystem biology studies how biotic and abiotic factors determine ocean ecosystem properties. With the emergence of global-scale studies of open sea organisms, much has been discovered about the genomics of oceanic microbial communities. One of the main projects in this category today is Tara Oceans, a multidisciplinary project aiming to understand the diversity, interactions, functions and phenotypic complexity at taxonomic and spatial scales of plankton, through about 35000 samples with millions of organisms collected at depths of up to 2000 m in 210 stations across the oceans. The main objective of this work was to compare oceanic samples exploring the abundance, diversity, and function of microorganisms found in tropical and subtropical zones, in order to understand how these factors affect the biodiversity of these species. For this, only samples from collection stations that presented samples in the three depth layers (SRF, DCM and MES) simultaneously were selected. This filtering resulted in 8 stations with a sum of 76 samples. These data were processed through the MEDUSA pipeline, following the standard flow of a metagenomic analysis: pre-processing, sequence alignment with a reference protein bank, taxonomic classification and functional annotation. With the results obtained from this process, it was possible to compare the abundance and diversity of these samples and to note and analyze the results of this comparison. A greater diversity of organisms was observed in the deepest layer (MES) and there was no significant difference in abundance between the explored depth layers. Some biological functions are unique to each depth layer, indicating its particularities and functional diversity. No significant distinction of abundance, diversity or function was observed comparing exclusively samples from collection stations in different geographic points.

5
  • PRISCILA CAROLINE DE SOUSA COSTA
  • IDENTIFICATION OF REMOTE HOMOLOGUES USING PROTEIN STRUCTURAL ALIGNMENT AND MACHINE LEARNING TOOLS

  • Advisor : TETSU SAKAMOTO
  • COMMITTEE MEMBERS :
  • LUCAS BLEICHER
  • PATRICK CESAR ALVES TERREMATTE
  • TETSU SAKAMOTO
  • Data: Dec 15, 2022


  • Show Abstract
  • Proteomics studies have shown the large number of proteins discovered and their importance to the study of life. However, there is still a high percentage of these proteins that have not been functionally annotated, limiting advances in several areas with healthcare and biotechnology. The functions of proteins are defined by their conformation and changes in the protein's three-dimensional structure, so data on the three-dimensional structure of these proteins helps in defining their functions. Currently, there is a large amount and diversity of proteins that have their sequence characterized, however, there is still a methodological bottleneck in obtaining their structural data. With the recent development of the AlphaFold program, which accurately predicts the three-dimensional structure of proteins from their amino acid sequence, this bottleneck can be overcome. Thus, the goal of this project is to evaluate the impact of using these structural prediction tools on functional annotations of proteins. In this work, we aim to functionally annotate protein domains of unknown function (DUF). To this end, predicted data of their three-dimensional structure was submitted to computational tools that perform a search for other structures that share structural similarity. Preliminary analyses have shown that many domains can benefit from this analysis. In addition, we generated a classification model that identifies whether two proteins that share a structural similarity are remote homologs. This classifier will be used in the future to analyze the similarity results and suggest functions to these domains.

Thesis
1
  • EMMANUEL DUARTE BARBOSA
  • Investigation of protein-ligand complexes by methods of quantum biochemistry and molecular evolution

  • Advisor : UMBERTO LAINO FULCO
  • COMMITTEE MEMBERS :
  • UMBERTO LAINO FULCO
  • JOAO PAULO MATOS SANTOS LIMA
  • EUDENILSON LINS DE ALBUQUERQUE
  • LUIZ ANTONIO RIBEIRO JUNIOR
  • VALDER NOGUEIRA FREIRE
  • Data: Feb 21, 2022


  • Show Abstract
  • This thesis presents three studies carried out in the sphere of molecular modeling based on principles of Quantum Mechanics. Additionally, molecular evolution methods complemented some results. The first study portrays the particularities of the performance of the energy and computational cost results of 9 combinations of models based on DFT (DFT -- Density Functional Theory) in an organometallic system formed by the divalent zinc cation and the enzyme Porphobilinogen Synthase PBGS. The interaction energies were obtained using the Fragmentation with Conjugated Caps (MFCC) scheme. The results of the total interaction energy profile showed linear quantitative differences, but were qualitatively uniform. The computational processing time dependency is more associated with the choice of basis set than the exchange and correlation functional. The second study presents a biochemical description from the interaction energy results obtained in the previous study, analyzing the biochemical profile of the most relevant PBGS residues that interact with zinc. In addition, a phylogenetic and cluster analysis was performed that evaluated the conservation of the relevant amino acids identified in the zinc-PBGS system. The most important intermolecular interactions were due to the participation of amino acids CS0122, CIS0124, CIS0132, ASP0169, SER0168, ARG0221, HIS0131, ASP0120, GLY0133, VAL0121, ARG0209, and ARG0174. Among these residues, ASP0120, GLI0133, HIS0131, SER0168, and ARG0209 stood out for occurring in all groups generated by the unsupervised cluster analysis. On the other hand, triple cysteines at 2.5 Å of zinc (CIS0122, CIS0124, and CIS0132) showed the highest attraction energy and are absent in Viridiplantae, Sar, Rhodophyta, and in some groups of Bacteria. The third work presented here investigates the interactions between the Lys49-PLA 2 toxin from the venom of Bothrops moojeni, which causes tissue necrosis in snakebite victims, and two compounds (varespladib, aspirin) with the potential to inhibit the myotoxic activity of these proteins. The methodology utilized here also uses quantum methods based on DFT within the MFCC scheme. From this study, it was possible to predict the relevance of the amino acids that form the Lys49-PLA 2 binding site, among them, we can mention LIS0069, LIS0049, LEU0005, ILE0009, CIS0029, GLI0030, HIS0048, PRO0018, ALA0019, CIS0045, TIR0052, TIR0022, PRO0125*, and FEN0126* which anchor varespladib and residues LIS0069, LIS0049, GLI0032, LEU0002, and LEU0005 which anchor aspirin.

2
  • DIEGO ARTHUR DE AZEVEDO MORAIS
  • MEDUSA: A PIPELINE FOR TAXONOMIC CLASSIFICATION AND FUNCTIONAL ANNOTATION OF METAGENOMES

  • Advisor : RODRIGO JULIANI SIQUEIRA DALMOLIN
  • COMMITTEE MEMBERS :
  • RODRIGO JULIANI SIQUEIRA DALMOLIN
  • Jorge Estefano de Santana Souza
  • LUCYMARA FASSARELLA AGNEZ LIMA
  • DIEVAL GUIZELINI
  • FABIANO CORDEIRO MOREIRA
  • Data: Apr 14, 2022


  • Show Abstract
  • Metagenomics involves the study of the microbial community found in a sample extracted from a given environment. This environment may be a cave wall, a portion of ocean water, the human gut, or any source containing microorganisms of interest. Such studies unravel details about the taxonomic composition and the functions performed by microbial communities. As a complete metagenomic analysis requires different tools for different purposes, the selection and setup of these tools remain challenging. Furthermore, the chosen toolset will affect the accuracy, the formatting, and the functional identifiers reported in the results, impacting the results interpretation and the biological answer obtained. The work presented here aims to propose a pipeline to be used in taxonomic and functional metagenomic analyses. To this end, state-of-the-art tools available in the literature were surveyed, and mock datasets were created to perform benchmarks. As a result, suited tools were selected for each analysis step, and a sensitive and flexible metagenomic analysis pipeline was designed. MEDUSA, an efficient pipeline to conduct comprehensive metagenomic analyses, performs preprocessing, assembly, alignment, taxonomic classification, and functional annotation on shotgun data, supporting user-built dictionaries to transfer annotations to any functional identifier. MEDUSA includes several tools, such as fastp, Bowtie2, DIAMOND, Kaiju, MEGAHIT, and a novel tool implemented in Python to transfer annotations to BLAST/DIAMOND alignment results. These tools are installed via Conda, and the workflow is managed by Snakemake, easing the setup and execution. Compared with MEGAN 6 Community Edition, MEDUSA correctly identifies more species, especially the less abundant, and is more suited for functional analysis using Gene Ontology identifiers.

3
  • PATRICK CESAR ALVES TERREMATTE
  • A Novel Machine Learning 13-Gene Signature: Improving Risk Analysis and Survival Prediction for Clear Cell Renal Cell Carcinoma Patients

  • Advisor : ADRIAO DUARTE DORIA NETO
  • COMMITTEE MEMBERS :
  • ADRIAO DUARTE DORIA NETO
  • BEATRIZ STRANSKY FERREIRA
  • CICILIA RAQUEL MAIA LEITE
  • DANIEL SABINO AMORIM DE ARAUJO
  • PAULO PIMENTEL DE ASSUMPÇÃO
  • TETSU SAKAMOTO
  • Data: May 13, 2022


  • Show Abstract
  • Patients with clear cell renal cell carcinoma (ccRCC) have poor survival outcomes, especially if it has metastasized. It is of paramount importance to identify biomarkers in genomic data that could help predict the aggressiveness of ccRCC and its resistance to drugs. Thus, we conducted a study with the aims of evaluating gene signatures and proposing a novel one with higher predictive power and generalization in comparison to the former signatures. Using ccRCC cohorts of the Cancer Genome Atlas (TCGA-KIRC) and International Cancer Genome Consortium (ICGC-RECA), we evaluated linear survival models of Cox regression with 14 signatures and six methods of feature selection, and performed functional analysis and differential gene expression approaches. In this study, we established a 13-gene signature (AR, AL353637.1, DPP6, FOXJ1, GNB3, HHLA2, IL4, LIMCH1, LINC01732, OTX1, SAA1, SEMA3G, ZIC2) whose expression levels are able to predict distinct outcomes of patients with ccRCC. Moreover, we performed a comparison between our signature and others from the literature. The best-performing gene signature was achieved using the ensemble method Min-Redundancy and Max-Relevance (mRMR). This signature comprises unique features in comparison to the others, such as generalization through different cohorts and being functionally enriched in significant pathways: Urothelial Carcinoma, Chronic Kidney disease, and Transitional cell carcinoma, Nephrolithiasis. From the 13 genes in our signature, eight are known to be correlated with ccRCC patient survival and four are immune-related. Our model showed a performance of 0.82 using the Receiver Operator Characteristic (ROC) Area Under Curve (AUC) metric and it generalized well between the cohorts. Our findings revealed two clusters of genes with high expression (SAA1, OTX1, ZIC2, LINC01732, GNB3 and IL4) and low expression (AL353637.1, AR, HHLA2, LIMCH1, SEMA3G, DPP6, and FOXJ1) which are both correlated with poor prognosis. This signature can potentially be used in clinical practice to support patient treatment care and follow-up.

4
  • IARA DANTAS DE SOUZA
  • Sex-specific transcriptional alteration analysis of major depressive disorder

  • Advisor : RODRIGO JULIANI SIQUEIRA DALMOLIN
  • COMMITTEE MEMBERS :
  • GLORIA REGINA FRANCO
  • GUSTAVO ANTONIO DE SOUZA
  • JOAO PAULO MATOS SANTOS LIMA
  • MATHEUS AUGUSTO DE BITTENCOURT PASQUALI
  • RODRIGO JULIANI SIQUEIRA DALMOLIN
  • Data: Jul 8, 2022


  • Show Abstract
  • Major depressive disorder (MDD) is an important neuropsychiatric disorder with high prevalence in Brazil, characterized by persistent depressed mood and/or loss of pleasure for at least two weeks. MDD is a disabling condition that predisposes to other complex pathologies, such as cardiovascular diseases, and may even result in suicide. MDD is more prevalent in women than in men and there are anatomical, immunological, neuronal and hormonal differences, which reflect different prognoses and symptoms between the sexes. However, there is no consensus regarding the MDD transcriptional alterations in men and women, as well as the functional implications of these alterations in the cellular metabolism. Most MDD transcriptional studies explain the disease’s pathophysiology by looking for changes in global gene expression. However, gene expression changes can also occur at the transcript level, as RNA splicing pathways may be altered. The present work seeks to investigate the transcriptional alterations of MDD in women and men through differential gene expression (DGE) analysis, differential transcript expression (DTE) analysis and analysis of differential isoform use (DTU) in post-mortem samples of six brain regions. The set of genes identified in at least one of the three approaches was called transcriptionally altered genes (TAGs), which represent the comprehensive transcriptional alteration profile of MDD. At total, 1075 TAGs were identified mainly in the prefrontal cortex. Approximately half of the transcriptional changes occurred only at the transcript level. We found a near absence of overlap between the altered genes identified in men and the ones identified in women. This indicates that MDD transcriptional alteration profile is sex-specific, considering both the gene- and the transcript-level alterations. We verified alterations in the RNA processing and export pathways in the orbitofrontal cortex of women. Additionally the DDX39B gene, an RNA splicing machinery member, was altered in different brain regions of women and men, respectively. Furthermore, we showed that the ATAT1 gene is altered in multiple brain regions of women and the ABR gene is altered in multiple brain regions of men, constituting potential sex-specific biomarkers for MDD.

5
  • THAÍS DE ALMEIDA RATIS RAMOS
  • SSingle cell systems biology of long non-coding RNAs associated with cardiac tissue development and cardiovascular disease

  • Advisor : VINICIUS RAMOS HENRIQUES MARACAJA COUTINHO
  • COMMITTEE MEMBERS :
  • VINICIUS RAMOS HENRIQUES MARACAJA COUTINHO
  • RODRIGO JULIANI SIQUEIRA DALMOLIN
  • THAIS GAUDENCIO DO REGO
  • GILDERLANIO SANTANA DE ARAÚJO
  • YURI DE ALMEIDA MALHEIROS BARBOSA
  • Data: Aug 2, 2022


  • Show Abstract
  • Long non-coding RNAs (lncRNAs) comprise the most representative transcriptional units of the mammalian genome, and they’re associated with organ development that can be associated with the emergence of diseases, such as cardiovascular diseases. The World Health Organization (WHO), for example, has published that cardiovascular diseases are responsible for the death of 17.9 million people each year, corresponding to 31% of all deaths all around the world. Therefore, a combination of transcripts from Gencode (M20), Ensembl (GRCm38.95) and Amaral et al. (2018) databases was used to define the set of non-redundant reference lncRNAs; and Gencode (M20) for the reference coding transcripts. In addition, bioinformatics approaches, machine learning algorithms and statistical techniques were used to define lncRNAs involved in mammalian cardiac development in a single-cell perspective. For this, the single-cell database published by DeLaughter et al. (2016) was used, in which there were data from 4 embryonic stages (E9.5, E11.5, E14.5, E18.5) and 4 post -natals (P0, P3, P7, P21) of the mus musculus model organism. Our study identified 8 distinct cell types, novel marker transcripts (coding/lncRNAs) and also, differential expression and functional enrichment analysis revealed cardiomyocyte subpopulations associated with cardiac function; meanwhile modular co-expression analysis reveals cell-specific functional insights for lncRNAs during myocardial development, including a potential association with key genes related to disease and the “fetal gene program”. Our results evidence the role of particular lncRNAs in heart development, and highlights the usage of co-expression modular approaches in the cell-type functional definition. As future work, we intend to identify the functional roles of these RNAs in the development of cardiac tissues and in cardiovascular diseases using experimental validation approaches.

6
  • ALYSON MATHEUS DE CARVALHO SOUZA
  • Novel Virtual Reality Methodologies Applied to Bioinformatics: A Perspective in Education and Research

  • Advisor : CESAR RENNO COSTA
  • COMMITTEE MEMBERS :
  • CESAR RENNO COSTA
  • RENAN CIPRIANO MOIOLI
  • CLEBER DA SILVEIRA CAMPOS
  • ROSILANE RIBEIRO DA MOTA
  • JONATAS MANZOLLI
  • Data: Nov 29, 2022


  • Show Abstract
  • Virtual Reality (VR) has been evolving rapidly and becoming more accessible to other areas of research through an easier development of experiences and easier acquisition of specific equipment. As a result, several research opportunities are created by integrating VR with other areas of knowledge. In neurosciences and cognitive sciences, VR has been used in two main ways - to bring the real world to the laboratory through simulations, increasing the ecological validity of experiments, or as a platform to create impossible situations, studying users through a window that was not available before. In education, VR has been seen as means to include other forms of teaching in the student’s daily life, moving away from traditional teaching and increasing engagement with ideas such as embodied cognition, using the body to learn and store information. Based on these aspects of VR integration, this cumulative thesis presents nine works developed within these two themes, aiming at proposing and implementing new work methodologies in cognitive sciences, neurosciences, arts, and education using VR. The works presented are discussed regarding their relevance and innovative aspects, and, finally, we conclude some opportunities for future work on top of the texts presented.

7
  • LUCAS MARQUES DA CUNHA
  • Development of a computational approach for the analysis and identification of polymorphic peptides

  • Advisor : GUSTAVO ANTONIO DE SOUZA
  • COMMITTEE MEMBERS :
  • FABIO PASSETTI
  • ADRIANA FERREIRA UCHOA
  • DANIEL CARLOS FERREIRA LANZA
  • GUSTAVO ANTONIO DE SOUZA
  • PAULO COSTA CARVALHO
  • Data: Nov 29, 2022


  • Show Abstract
  • The proteomic approach allows large-scale studies of protein expression in different tissues and body fluids, aiming to identify and quantify the total protein content. In the proteomic analysis process, protein identification still presents limitations despite major advances in the area. Frequently, a mass spectrometer is used to generate mass/charge values of the samples. After this process, a reference protein database (eg, UniProt) is usually used to identify proteins. However, using a reference database limits the analysis of the identification of the proteins, since it does not contain the variations in the DNA that can impact the sequence of amino acids, causing incorrect identification or making the process impossible. In this context, there are several custom databases that incorporate such genetic variations. Although they present good results, they are also limited due to the absence of some mutations, becoming another problem in the identification process. A proteogenomics database (dbPepVar) created here combines genetic variation information from dbSNP with protein sequences from NCBI's RefSeq. Public mass spectrometry datasets were used to perform a pan-cancer analysis (Ovarian, Colorectal, Breast, and Prostate), allowing the identification of unique genetic variations. In total 3,726 variant peptides were identified in ovarian cancer samples, 2,543 in prostate, 2,661 in breast and 2,411 in colon-rectal cancer. A mutational frequency analysis showed genes involved in tumor progression processes, sensitivity to chemotherapy, and risk of susceptibility to cancer. Interestingly, in many samples, C-terminal peptides from shortened proteins originating from premature termination codon (PTC) events were identified. This indicates that such proteins had escaped Nonsense-mediated decay (NMD) and, not surprisingly, NMD machinery genes are also mutated in the same samples. This suggests that the vestige of the truncated transcript may be associated with NMD machinery inefficiency caused by gene mutations. In perspective, the web portal developed as well as the analysis performed may direct studies to identify new therapeutic targets for different cancer, and one can also use our database for characterization of variants in samples of unknown genetic background, such as archived samples. The portal is available in: https://bioinfo.imd.ufrn.br/dbPepVar/

8
  • DANIEL SOARES BRANDAO
  • Investigation of the cognitive functions of sleep and dreams through electroencephalography, verbal reports and electronic games

  • Advisor : SIDARTA TOLLENDAL GOMES RIBEIRO
  • COMMITTEE MEMBERS :
  • DANIEL YASUMASA TAKAHASHI
  • FELIPE BEIJAMINI
  • GUILHERME BROCKINGTON
  • MARIO ANDRE LEOCADIO MIGUEL
  • SIDARTA TOLLENDAL GOMES RIBEIRO
  • Data: Dec 14, 2022


  • Show Abstract
  • Sleep is an important bodily and mental state for the elimination of toxins generated by metabolism and for the consolidation of memories. It is a very conserved state throughout animal evolution, being present in all species of reptiles, birds and mammals already studied, as well as several invertebrates. Due to its high evolutionary conservation, it is very likely that sleep had a great influence on the constitution of the different behaviors found in animals. The importance of sleep for memory consolidation has established the fundamental role of this phenomenon in improving task performance. Furthermore, it has recently been shown that dreaming is also involved in improving task performance. The Threat Simulation Theory by Revonsuo and Valli (2000) proposed that dreaming would have been selected throughout evolution for its adaptive value, functioning as an alert for the possibility of future threats. Could the evolution of the different habits of prey and predators have been influenced by sleep and/or dreams? The investigation of the role of sleep and dreams in prey versus predator relationships in humans is quite promising, both because humans can communicate the content of dreams they had, and because of the possibility of developing complex tasks using video games that simulate prey versus predator situations that would be difficult to emulate in animal models.
    In this context, experiments were carried out with 15 pairs of volunteers, who came together to the laboratory and had their brain activity recorded simultaneously through electroencephalography (EEG). During the recording, each pair engaged in an interactive electronic game for 45 minutes, then laid to sleep for 2 hours and then played again for another 45 minutes. During the game, one of the participants was randomly selected to play the role of prey and the other to play the role of predator. The prey could kill the opponent only by punches, while the predator also had a firearm. Therefore, the predator had a great advantage in the direct dispute with the prey, as it happens in nature. Dream reports were analyzed through the opinion of 4 independent evaluators who reviewed the reports blindly. The evaluators indicated the degree of certainty that the participant actually dreamed and the degree of clarity of this memory; they also defined whether the dreams were related to the game, the laboratory, personal life, being prey and being a predator. The EEG signals were analyzed automatically, through data processing algorithms developed specifically for this study, adapting the sequence of data transformations after visual inspection of the results. The power of oscillations in characteristic frequency bands, the properties of slow oscillations and sleep spindles, the characteristics of sleep stages and sleep scales were evaluated. An analytical technique that searches for recurrent patterns of spatial distribution of electrical activity was also applied to the EEG signal; such microstates are related to the activities of specific neural circuits through the labels “A”, “B”, “C” and “D”.
    The results indicate that preys reported dreaming more than predators, and that prey scores were positively correlated with how much the dream report was related to the game. Prey also benefited more than predators from having a deeper sleep, which also correlated with prey score. The prey had higher power in delta (1 to 3 Hz), which also favored the prey score, mainly through the amplitude of the slow oscillations during sleep. No significant effect was found for sleep spindles. The prey's performance was impaired by the number of occurrences of the microstate C, which is associated with neural activations not specifically related to the proposed task.
    Taken together, the results suggest that slow waves during sleep and game-related dream content favorably influence participants' performance in the prey role, but not in the predator role. A possible explanation for this dichotomy would be that sleep and dreams are important for adapting to challenging situations, not being so relevant in situations to which the individual is already adapted.

2021
Dissertations
1
  • PITÁGORAS DE AZEVEDO ALVES SOBRINHO
  • RNA-Gatherer: a computational tool for annotation of non-coding RNAs in understudied organisms

  • Advisor : WILFREDO BLANCO FIGUEROLA
  • COMMITTEE MEMBERS :
  • WILFREDO BLANCO FIGUEROLA
  • Jorge Estefano de Santana Souza
  • ÂNDREA KELY CAMPOS RIBEIRO DOS SANTOS
  • Data: Jan 29, 2021


  • Show Abstract
  • Non-coding RNAs are molecules that play decisive roles in several types of gene regulation. Identifying them is necessary for understanding the genetics of a species. Several factors, such as: low level of expression, the broad spectrum of subtypes, diverse attributes, heterogeneous functions and absence of homology between species; make the detection of ncRNAs genes a challenge. The latest bioinformatics strategies for detecting ncRNA genes have tried to identify their locations in the genomes and their secondary structures, using covariance models and artificial intelligence. The co-expression of these genes has been computationally analyzed in order to reveal their functional annotations. However, there is no consensus on which metrics and parameters to use in the process of predicting the functions of these molecules. In organisms little known, such as Arapaima gigas, the lack of reference information increases the difficulty. Additionally, even for known long non-coding RNAs, there is little functional information, which makes it difficult to explain the roles of these genes. In this work, we describe a software for discovering the non-coding genes, including their diverse types, and their functions in eukaryotic genomes. It was validated by annotating a model species (Mus musculus) and then used to explore the landscape of ncRNA in Arapaima gigas. Comparing the similarity between the functions of co- expressed genes allowed us to define confidence levels for the metrics that measure co-expression, and thus, develop a pipeline for predicting lncRNA functions, which includes metrics for non-linear correlations. The described software suite made 63307 non-coding annotations in A. gigas, including 11 types of ncRNA and 4 types of cis-regulatory regions. Of these annotations, only 706 are similar to ncRNAs already known in other species and the remaining were never described before. The exploratory analysis of lncRNA also revealed 19854 tissue specific lncRNAs and 256 lncRNAs ubiquitously expressed. Predicting the functions of these molecules revealed RNAs involved in skin pigmentation, sex differentiation, growth and defense against tumors.

2
  • TAYRONE DE SOUSA MONTEIRO
  • Reverse engineering of medulloblastoma regulatory networks and inference of master regulators

  • Advisor : RODRIGO JULIANI SIQUEIRA DALMOLIN
  • COMMITTEE MEMBERS :
  • RODRIGO JULIANI SIQUEIRA DALMOLIN
  • RITA MARIA CUNHA DE ALMEIDA
  • MARIALVA SINIGAGLIA
  • Data: Aug 31, 2021


  • Show Abstract
  • Medulloblastoma (MB) is a cancer of the cerebellum occurring most frequently in the pe- diatric population. This tumor is classified into four distinct molecular subgroups (WNT, SHH, group 3 and group 4), each one also presenting unique clinical features. Some medul- loblastoma epigenetic drivers have been reported by some studies, although the inference of regulatory networks and master regulators have been mentioned only once. Here, we inferred the transcriptional regulatory networks of SHH, group 3 and group 4 subgroups and recognized 10 regulatory units as master regulators and differentially methylated regulons, simultaneously, for all investigated subgroups, subsequently named as the “re- gulons of interest”. The activity pattern of these regulons was observed to vary across subgroups. KEGG pathway enrichment analysis was also done, considering the content of all regulons of interest in each regulatory network. Two KEGG terms were found con- comitantly for all investigated subgroups. This work contributes to the comprehension of the medulloblastoma regulome, identifying prospective master regulators, analyzing their methylome and pointing to potential therapeutic targets.

3
  • LUKAS IOHAN DA CRUZ CARVALHO
  • Analysis of modular gene co-expression networks reveals molecular pathways underlying Alzheimer’s disease and progressive supranuclear palsy

  • Advisor : MARCOS ROMUALDO COSTA
  • COMMITTEE MEMBERS :
  • MARCOS ROMUALDO COSTA
  • RODRIGO JULIANI SIQUEIRA DALMOLIN
  • TARCISO ANDRE FERREIRA VELHO
  • RICARDO AUGUSTO DE MELO REIS
  • Data: Sep 28, 2021


  • Show Abstract
  • The incidence of neurodegenerative diseases leading to impairment of cognitive functions and dementia have increased in recent years, mainly because of enhanced longevity in the population worldwide. Understanding the onset and progression of these pathologies can help to develop preventive and disease-modifying treatments for these diseases. In this work, using RNA-seq data obtained from two brain regions (temporal cortex and cerebellum) of human patients diagnosed with neurodegenerative diseases (Alzheimer or Progressive Supranuclear Palsy) and two animal models, 5XFAD of amyloidopathy and TauD35 of tauopathy, we performed an integrative analysis at the gene/transcript level combined with a co- expression analysis to identify similarities and discrepancies in the biological processes affected by these two diseases. So that we could compare the different data, we used the only common variable in all datasets: age of death. Thus, we divided the human data into 3 groups: A (70-80), B (81-89) and C (90+); and animals in groups of 4 months, 12 months, 17 months and 18 months. The results of the transcriptional analysis showed that gene expression alterations associated with immune-inflammatory alterations are present in AD only in the temporal cortex and not in the cerebellum, and that alteration related to synaptic transmission occurs late (groups B and C), and are found only when we use genes with isoform switches in the analysis of functional enrichment in conjunction with differentially expressed genes. In PSP, all changes associated with immune-inflammatory responses and synaptic transmission are found exclusively in temporal cortex data; however, all changes are specific for group A. In animal models, changes in 5XFAD are similar to those found in AD human brains, with gene expression alterations associated with the immune-inflammatory response present early (4 months) and synaptic terms only at late pathological stages (18 months). In TauD35 mice, this pattern is inverted, with gene expression changes associated with immune- inflammatory response identified only late (17-month group), whereas those associated with synapses could be identified early (4-month group). In addition to these results, we observed that changes in isoforms (gDTUS) are present almost exclusively in humans, and especially in AD. To refine our results, we used a co-expression approach and identified modules with specific expression and gene signatures. In AD, modules involving synapses did not differ from control, however, modules related to immune-inflammatory response, extracellular matrix and growth factor response were more active in individuals with AD. In PSP, modules with synaptic activity showed greater activity compared to control, while those related to immune response had a lower activity. To confirm the genetic identity of these modules, we also mappedmodule-specific genes to different cell types of the brain using single-cell RNA-seq data. This analysis revealed a correspondence between modules related to the immune-inflammatory response with microglial cells and, to a lesser extent in AD, astrocytes, synaptic cells with glutamatergic neurons and myelination with oligodendrocytes. Finally, we show that genes identified as risk factors for AD or PSP are present in specific co-expression. Together, these results suggest that in the amyloidopathy model and in AD, alterations in synaptic signaling form a positive feedback with the immune inflammatory response, the latter being the first; while in the model of tauopathy and PSP, the effects on inflammation are secondary to synaptic changes.

4
  • ANDRÉ LUIZ DE LUCENA MOREIRA
  • Evolutionary strategies applied to artificial gene regulatory networks

  • Advisor : CESAR RENNO COSTA
  • COMMITTEE MEMBERS :
  • CESAR RENNO COSTA
  • WILFREDO BLANCO FIGUEROLA
  • DIOGO SANTOS PATA
  • Data: Sep 29, 2021


  • Show Abstract
  • Evolution optimizes cellular behavior throughout sequential generations by selecting the successful individual cells in a given context. As gene regulatory networks (GRNs) determine the behavior of single cells by ruling the activation of different processes - such as cell differentiation and death - how GRNs change from one generation to the other might have a relevant impact on the course of evolution. It is not clear, however, which mechanisms that affect GRNs effectively favor evolution and how. Here, we use a population of computational robotic models controlled by artificial gene regulatory networks (AGRNs) to evaluate the impact of different genetic modification strategies in the course of evolution. The virtual agent senses the ambient and acts on it as a bacteria in different phototaxis-like tasks - orientation to light, phototaxis, and phototaxis with obstacles. We studied how the strategies of gradual and abrupt changes on the AGRNs impact evolution considering multiple levels of task complexity. The results indicated that a gradual increase in the complexity of the performed tasks is beneficial for the evolution of the model. Furthermore, we have seen that larger gene regulatory networks are needed for more complex tasks, with single-gene duplication being an excellent evolutionary strategy for growing these networks, as opposed to full-genome duplication. Studying how GRNs evolved in a biological environment allows us to improve the computational models produced and provide insights into aspects and events that influenced the development of life on earth.

5
  • PAULO HENRIQUE LOPES CARLOS
  • The impact of the governmental nonpharmaceutical interventions in Brazillian cities during the first SARS-CoV-2 pandemic surge: An agent-based computational modeling study of the City of Natal

  • Advisor : WILFREDO BLANCO FIGUEROLA
  • COMMITTEE MEMBERS :
  • WILFREDO BLANCO FIGUEROLA
  • CESAR RENNO COSTA
  • RENAN CIPRIANO MOIOLI
  • LEANDRO DE ALMEIDA
  • Data: Oct 25, 2021


  • Show Abstract
  • The first wave of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic hit almost all cities in Brazil in early 2020 and lasted for several months. Despite the effort of local state and municipal governments, an inhomogeneous nationwide response resulted in a death toll among the highest recorded globally. To evaluate the impact of the nonpharmaceutical governmental interventions applied by different cities - such as the closure of schools and business in general - in the evolution and epidemic spread of SARS-CoV-2, we constructed a full-sized agent-based epidemiological model adjusted to the singularities of single cities. The model incorporates detailed demographic information, mobility networks segregated by economic segments, and restricting bills enacted during the pandemic period. As a case study, we analyzed how the City of Natal - a midsized state capital - reacted to the pandemic. Although our results indicate that the governmental response was suboptimal, the restrictive mobility acts saved many lives, our simulations showed that the suspension of school activities was essential to avoid a high number of deaths (the increase would be around 525.93%). The authentic closing of Work activities would decrease the number of deaths by approximately 67.54% and religious activities by 26.7%. The absence of intervention would result in a catastrophic scenario of 6779 deaths, this number corresponds to about 0.77% of the Natal city population. The simulations show that a compartmental analysis of the alternative scenarios can inform policymakers about the most impactful measures for further surges of the pandemic and support future decisions as the pandemic progresses.

6
  • ELISEU JAYRO DE SOUZA MEDEIROS
  • Genetic basis associated with the serological classification of Leptospira: a case of study of Sejroe serogroup

  • Advisor : TETSU SAKAMOTO
  • COMMITTEE MEMBERS :
  • TETSU SAKAMOTO
  • Jorge Estefano de Santana Souza
  • ANNA MONTEIRO CORREIA LIMA
  • Maria Raquel Venturim Cosate
  • Data: Nov 30, 2021


  • Show Abstract
  • Leptospirosis is a widely distributed zoonosis caused by pathogenic strains of bacteria of the genus Leptospira (Phylum Spirochaetes). Its agents are commonly classified based on their antigenic characteristics into serogroups and serovars, which are relevant for epidemiologic studies and vaccine development. However, the methods used for this are considered laborious and require a specialized infrastructure. Some molecular methods were proposed to accelerate these procedures, but they still can not replace the immunological tests, thus requiring a further understanding of the genetic basis underlying the serological classification. In this work, we focused on elucidating the genetic factors determinant for the serogroup Sejroe, which is one of the most prevalent serogroups in livestock. For this, we conducted a comparative genomic analysis using more than 700 leptospiral samples available in the public database. The analysis showed that the genes comprising the rfb locus are the main genetic factors associated with the serological classification. Samples from the Sejroe serogroup have an rfb locus with a conserved gene composition that differs from most other serogroups. Hebdomadis and Mini were the only serogroups whose samples have rfb locus with similar gene composition to those from serogroup Sejroe, corroborating with the serological affinity shared by them. Finally, we could determine a small region in the rfb locus in which each of those three serogroups can be distinguished by its gene composition. This is the first work that uses an extensive repertoire of genomic data of leptospiral samples to elucidate the molecular basis of the serological classification and open the road to more reliable strategies based on molecular methods for serodiagnosis.


Thesis
1
  • DIEGO MARQUES COELHO
  • FROM BULK TO SINGLE-CELL: HOW DIFFERENT TECHNIQUES ASSIST IN IDENTIFICATION OF BIOLOGICAL EVENTS MARKERS?

  • Advisor : MARCOS ROMUALDO COSTA
  • COMMITTEE MEMBERS :
  • MYCHAEL VINÍCIUS DA COSTA LOURENÇO
  • MARCOS ROMUALDO COSTA
  • PATRICIA PESTANA GARCEZ
  • RODRIGO JULIANI SIQUEIRA DALMOLIN
  • TARCISO ANDRE FERREIRA VELHO
  • Data: May 31, 2021


  • Show Abstract
  • Large-scale messenger RNA sequencing (RNAseq) allows the evaluation of the diversity of transcripts expressed at a given moment in a biological system. Through bioinformatics, we can analyze the sequencing data to obtain quantitative information about gene expression, such as the differential expression of genes and their isoforms (alternative splices). In this thesis, we present two independent studies that used bioinformatics to obtain relevant information about different biological phenomena. In the first case, we used mRNA sequencing data in the brains of patients with Alzheimer's disease to study the differential expression of genes and transcripts associated with the progression of this disease. We have shown that the analysis of transcripts allows the identification of genetic changes ignored in previous studies by evaluating only the global expression of genes. Using single cell mRNA sequencing data (scRNAseq), we also map changes in gene expression in the brain of patients with Alzheimer's disease to specific cell types. The results of this first work contribute to a better understanding of the pathophysiology of Alzheimer's disease and pinpoints possible cell-type specific molecular mechanisms of the disease. In the second work developed in this thesis, we used the scRNAseq technique to study the diversity of progenitor cells in the early stages of the development of the neocortex. Through analysis of differential gene expression and the use of an approach using gene regulatory networks, we identified the transcription factor Sox9 as a master regulator of the behavior of different subtypes of neural progenitors. Confirming these findings from bioinformatics, genetic experiments to manipulate Sox9 expression levels in neural progenitors demonstrated the importance of this transcription factor in the regulation of cell proliferation and differentiation. Together, the results of this thesis demonstrate the importance of transcriptomic analysis through complementary methods for a better identification of relevant gene expression changes in different biological contexts.

2
  • PRISCILLA SUENE DE SANTANA NOGUEIRA SILVERIO
  • 3D-QSARpy: Combining Variable Selection Strategies and Various Machine Learning
    Techniques to Build QSAR Models

  • Advisor : EUZEBIO GUIMARAES BARBOSA
  • COMMITTEE MEMBERS :
  • AMANDA GONDIM DE OLIVEIRA
  • ANNE MAGALY DE PAULA CANUTO
  • ARAKEN DE MEDEIROS SANTOS
  • EUZEBIO GUIMARAES BARBOSA
  • JOAO PAULO MATOS SANTOS LIMA
  • LAURA EMMANUELLA ALVES DOS SANTOS SANTANA DE OLIVEIRA
  • Data: Aug 4, 2021


  • Show Abstract
  • Quantitative Structure Activity Relationship (QSAR) is a technology in the field of medicinal chemistry that seeks to clarify the relationships between molecular structures and their biological activities. For this, QSAR models are constructed from the structural data (2D, 3D or 4D) from a series of molecules already tested for a given activity. Through predictions made by these models, it is aimed to identify which modifications in the molecule can influence, reinforcing or not the biological response. Such technology allows accelerating the development of new compounds by reducing the costs for drug design. Considering the briefly exposed context, the present work aims to propose a methodology and test it in several data sets through the development of a tool for QSAR-3D, then called 3D-QSARpy. The methodology was successfully validated through the application of the tool in two sets of data, which results outperformed those previously published. The first set involving diabetes treatment, it reached r 2 pred =0.91. The second set referring to cancer treatment, with r 2 pred =0.98. Finally, two applications of the tool were performed, contributing to the identification of new bioactive molecular structures using different approaches. The first of which is intended for the treatment of chagas disease, including the construction of hybrid QSAR models for three series, obtaining r 2 pred = 0.8, 0.68 e 0.85. The second application was the construction of QSAR-4D for the tuberculosis treatment with r 2 pred = 0.72. It doesn’t matter if the experiments were for validation or for the identification of these new molecules. All of them demonstrated not only the efficiency of the proposed methodology and the developed tool, but also the versatility of possible applications with this methodology, either following its general pipeline or using it in a partially way combined with other existing tools.

3
  • RAFFAEL AZEVEDO DE CARVALHO OLIVEIRA ANDRADE
  • Reverse engineering of pediatric sepsis regulatory network and master regulators identification

  • Advisor : RODRIGO JULIANI SIQUEIRA DALMOLIN
  • COMMITTEE MEMBERS :
  • RODRIGO JULIANI SIQUEIRA DALMOLIN
  • CESAR RENNO COSTA
  • JOAO PAULO MATOS SANTOS LIMA
  • FABIO KLAMT
  • MATHEUS AUGUSTO DE BITTENCOURT PASQUALI
  • Data: Aug 11, 2021


  • Show Abstract
  • Sepsis is a acute inflammatory syndrome. Accountable for most obits in ICUs all over the world. Due to its multifactorial nature, there are few studies related to gene expression regulation in pediatric septic patients. Understanding the regulatory mechanisms of sepsis could help against sepsis and also help identify key points of signaling pathways responsible for disease progression. A good strategy to identify regulatory targets of a given disease is by reconstructing its regulatory network, as well as identify its possible master regulators. Given the lack of pediatric sepsis data and the huge difference between adult and pediatric immune response, the objective of this work is to reconstruct sepsis regulatory network and identify its putative master regulators. In summary, we found 15 transcription factors that have good chance of acting as master regulators in pediatric sepsis. Specially MEF2A, TRIM25 and RFX2 were identified upregulated in septic patients in comparison to healthy individuals. Each one of them have a distinct role, that was not directly related to sepsis. But, taken together, we hypothesize that they might act together to influenciate the disease prognosis. Results herein found points towards this three transcription factors as putative master regulators of pediatric sepsis. In vitro validation of the results found in silico could shed light in the different aspects of regulatory understanding of pediatric sepsis.

4
  • JOSIVAN RIBEIRO JUSTINO
  • MODEL FOR IDENTIFYING BIMODAL GENES ASSOCIATED WITH CANCER PROGNOSIS

  • Advisor : SANDRO JOSE DE SOUZA
  • COMMITTEE MEMBERS :
  • Giovana Torrezan
  • Jorge Estefano de Santana Souza
  • MARCUS ALEXANDRE NUNES
  • SANDRO JOSE DE SOUZA
  • ÂNDREA KELY CAMPOS RIBEIRO DOS SANTOS
  • Data: Sep 16, 2021


  • Show Abstract
  • In the last decades, the biological interest in understanding the phases of gene regulation has led to the discovery of tumor genes with differentiated expression in subgroups of patients. These genes have a bimodal profile of expression value distribution, which has raised attention to investigate the patterns of development and their functionality. A major limitation of traditional methods is to identify homogeneous subgroups representing distinct levels of gene expression value for the same tumor. We developed a method that selects candidate genes for the bimodality pattern from the probability density function of the expression values, allowing to minimize the internal heterogeneity of the peaks. We analyzed 25 tumor types, found 96 genes with consistent samples regarding survival prognosis with a p-value ≤ 0.01. As a contribution, we have a method with the free code, which makes it possible to reduce the levels of internal variability of the groups and which relates the bimodal expression pattern with survival prognosis. Thus, we believe that the use of the method may be useful in the evaluation of the bimodal pattern of gene expression and in the discovery of new clinical biomarkers for different types of cancer.

5
  • INACIO GOMES MEDEIROS
  • Sequence feature selection to solve biological questions related to variant analysis and Anti-SARS-CoV-2 siRNAs development.

  • Advisor : Jorge Estefano de Santana Souza
  • COMMITTEE MEMBERS :
  • ARAKEN DE MEDEIROS SANTOS
  • BEATRIZ STRANSKY FERREIRA
  • Jorge Estefano de Santana Souza
  • SIDNEY EMANUEL BATISTA DOS SANTOS
  • TIRZAH BRAZ PETTA
  • Data: Sep 21, 2021


  • Show Abstract
  • Analysis of variants in clinical context and the support for the development of therapies against viral diseases are two areas which several research have used processes of integration and analysis of omics data. Assessing whether a given variant has a pathogenic impact is a challenge in the analysis of variants, especially when different tools for predicting pathogenicity point to divergent results. Regarding the development of RNA interference-based therapies, it is observed that there is a continuing need to design and evaluate the efficiency of new small-interfering RNAs (siRNAs) for each new virus that arises, like SARS-CoV-2, responsible for the COVID-19 pandemic. In this sense, it is argued in this thesis, based on the discussion of two works, that data integration and feature selection processes can contribute to the resolution of issues related to the identification of pathogenicity of variants and, in a second moment, to the availability of information and characteristics of sequences that may serve as the basis for therapies for COVID-19. In general terms, the study aimed (a) to develop data integration methods and selection of variant characteristics to measure pathogenicity and (b) to develop data integration methods for the construction of a database of siRNAs for SARS-CoV-2. To achieve the first objective, a decision tree-based classification model was proposed to estimate the pathogenicity of variants, built through an integration process of public data of already cataloged variants with pathogenicity predictions provided by machine learning-based tools. The model was able to present a higher accuracy than the state of the art regarding the prediction of pathogenicity of variants, constituting an important tool to support health professionals, such as in the diagnosis of genetic diseases. In the second objective, data on available properties, thermodynamics, toxicity, similarity, and efficiency were combined to assemble a global catalog of siRNAs for SARS-CoV-2. The integration of diverse properties related to siRNAs in a single consolidated database is an information reference that allows the realization of simple and targeted filtering in siRNA, saving the execution of many wet-lab tests on candidate molecules for COVID-19 antiviral therapies. These studies have common features with other data integration works in aspects involving data diversity, reproducibility, and knowledge discovery. Finally, it was found that these studies have potential for clinical application, either to increase the understanding of variants related to different genetic comorbidities, in the case of the first work, or to support the development of therapies against COVID-19, in the case of second job.

6
  • ANA CLÁUDIA COSTA DA SILVA
  • in silico investigation of synaptic sleep reorganization mechanism. An algorithm to maximize computational capacity of sparse neural networks

  • Advisor : SIDARTA TOLLENDAL GOMES RIBEIRO
  • COMMITTEE MEMBERS :
  • SIDARTA TOLLENDAL GOMES RIBEIRO
  • CESAR RENNO COSTA
  • RODRIGO JULIANI SIQUEIRA DALMOLIN
  • MADRAS VISWANATHAN GANDHI MOHAN
  • MAURO COPELLI
  • NIVALDO ANTONIO PORTELA DE VASCONCELOS
  • Data: Nov 9, 2021


  • Show Abstract
  • The memories are stored in the brain by the persistent changes of the connectivity between neurons. Sleep plays an essential role in such changes. Research on sleep neurology has shown the activation of longterm synaptic plasticity. Experimental data point to a double role of sleep: the weakening of irrelevant memories and the reinforcement of more important ones. The hypothesis investigated in this thesis is that synaptic reinforcement and pruning, involved in memory consolidation, can bring advantages to artificial neural networks. This thesis aims to apply neurobiological sleep-dependent learning mechanisms to machine learning. For this, we carried a review of memory consolidation theories and the computational models that support these theories. Observing how the brain optimizes biological resources, the research followed the trend of artificial neural networks to apply concepts present in biological learning in machine learning. Then computer simulations were carried out to explore the hypothesis that the underlying mechanisms used by the brain for biological learning through sleep are capable of optimizing artificial neural network learning. The synaptic spatiality can bring advantage for resource economy without a learning decay, so we used a sparse artificial neural network to learn different datasets and then test if sleep could reduce the minimum of synapses that a system needs to learn patterns. The simulations were carried in different network sizes, such as different sparsity levels, several databases, in addition to modern frameworks and algorithms for artificial neural network learning. The results corroborate the hypothesis that sleeping reduces the number of synapsis required to a certain learning limit.

7
  • GUILHERME FERNANDES DE ARAÚJO
  • A simulation platform for evolutionary biological scenarios applied to the extended fitness hypothesis

     

  • Advisor : SANDRO JOSE DE SOUZA
  • COMMITTEE MEMBERS :
  • André Fujita
  • CESAR RENNO COSTA
  • DIOGO MEYER
  • JOAO PAULO MATOS SANTOS LIMA
  • SANDRO JOSE DE SOUZA
  • Data: Nov 24, 2021


  • Show Abstract
  • The impact of extended phenotypes on the contemporary theory of evolution is controversial. The extended phenotype theory states that the expression of genes may have effects beyond the body of the individual who possesses it, affecting evolutive results of other individuals which coexist with it.The extended fitness proposes that individuals with enough genetic similarity may use the extended phenotypes of each other, thus increasing the chances of survival and reproduction of the group as a whole. This work aims to model these interactions through random scale-free networks, and investigate the impact of extended phenotypes and its effects in the reproductive success of individuals in the context of groups capable of producing and sharing them. The advantages given by the use of extended phenotypes released by similar neighbors may grant an evolutionary incentive at the group level to build and share them, and this equilibrium is measured in different simulations of behavior models.

2020
Dissertations
1
  • LUCAS CAIÃ DE SOUZA TAVARES
  • Hippocampal-Prefrontal Interactions during Spatial Decision-Making

  • Advisor : ADRIANO BRETANHA LOPES TORT
  • COMMITTEE MEMBERS :
  • ABNER CARDOSO RODRIGUES NETO
  • ADRIANO BRETANHA LOPES TORT
  • CESAR RENNO COSTA
  • WILFREDO BLANCO FIGUEROLA
  • Data: Feb 28, 2020


  • Show Abstract
  • The hippocampus has been linked to memory encoding and spatial navigation, while the prefrontal cortex is associated with cognitive functions such as decision-making. These regions are hypothesized to communicate in tasks that demand both spatial navigation and decision-making processes. However, the electrophysiological signatures underlying this communication remain to be better elucidated. To investigate the dynamics of the hippocampal-prefrontal interactions, we have analyzed local field potentials and spikes recorded from rats performing an odor-cued spatial alternation task in an 8-shaped maze. We found that the phase coherence of both theta (6-10 Hz) and beta (23-30 Hz) peaked around the choice point area of the maze. Moreover, Granger causality revealed a hippocampus->prefrontal cortex directionality of information flow at theta frequency, peaking at starting areas of the maze, and on the reverse direction at delta frequency, peaking near the turn onset. Additionally, the patterns of phase-amplitude cross-frequency coupling within and between the regions showed spatial selectivity. Lastly, we found that the theta rhythm dynamically modulated neurons in both regions; interestingly, prefrontal cortex neurons were more strongly modulated by the hippocampal theta rhythm than by its LFP. In all, our results reveal maximum electrophysiological interactions between the hippocampus and the prefrontal cortex near the decision-making period of the spatial alternation task. These results corroborate the hypothesis that a dynamic interplay between these regions takes place during spatial decisions.

2
  • EDEN SILVA E SOUZA
  • Evaluation of the predicted target of plumieridine in Cryptococcus neoformans var. Grubbii H99

  • Advisor : MARILENE HENNING VAINSTEIN
  • COMMITTEE MEMBERS :
  • MARILENE HENNING VAINSTEIN
  • EUZEBIO GUIMARAES BARBOSA
  • GUSTAVO ANTONIO DE SOUZA
  • CHARLEY CHRISTIAN STAATS
  • Data: Feb 28, 2020


  • Show Abstract
  • Cryptococcosis is a fungal infection caused by yeasts of Cryptococcus spp. The infection starts when desiccated cells or spores are inhaled and reach the lungs. If the disease is not properly treated, the infection can evolve and reach the central nervous system and result in meningococcal meningitis and even death. The treatment of cryptococcosis is carried out in three stages and uses three drugs: fluconazole, amphotericin B and 5-flucytosine. Although effective, the use of these drugs can result in fungal resistance and can be toxicity for patients. This work aims to investigate the mode of action of the antifungal compound plumieridine as well as the identification of its molecular target in C. neoformans. For this, a series of in vitro and in silico experiments were carried out. Initially, a chromatographic fraction containing plumieridine was obtained from the aqueous extract from seeds of Allamanda polyantha and the presence of the compound observed through carbon and hydrogen nuclear magnetic resonance. Antifungal activity, assessed through MIC, was 0.250 mg/mL. Through virtual screening based on ligand’s similarity, chitinase was identified as plumieridine’s molecular target. Three- dimensional models of C. neoformans chitinases were created and, through molecular docking, it is observed plumieridine interacts with residues in the active site. Chitinolytic inhibitory activity assays show that activity is significantly reduced in the secreted fraction and soluble cell fraction, however, the chitinolytic activity is little reduced by the presence of plumieridine in the insoluble cell fraction, where higher concentrations of the compound are needed. Although plumieridine is able to inhibit chitinolytic activity, the compound does not appear to affect the transcriptional levels of C. neoformans chitinases: only transcription of CHI22 was reduced in the presence of plumieridine. The treatment with plumieridine still alters the distribution pattern of the chitooligomers in the cellular wall: from a polarized pattern to a diffuse pattern through the wall. The results confirm the prediction of virtual screening and show that inhibition of chitinolytic activity by plumieridine results in incomplete cell division and, consequently, cell death.

3
  • RENATA LILIAN DANTAS CAVALCANTE
  • Exploratory investigation of genetic factors associated with the sex-determination system in Arapaima gigas (Pirarucu)

  • Advisor : TETSU SAKAMOTO
  • COMMITTEE MEMBERS :
  • TETSU SAKAMOTO
  • GUSTAVO ANTONIO DE SOUZA
  • SIDNEY EMANUEL BATISTA DOS SANTOS
  • Data: Mar 30, 2020


  • Show Abstract
  • The Pirarucu, (Arapaima gigas) is one of the largest freshwater bony fish in the world,with adults that can weigh 200 kilograms and measure 3 meters in length. It belongs to the Arapaimidae family, of the Osteoglossiformes order and has the Amazon Basin as its natural habitat. Due to its large size and its low fat containing and low fishbone, Arapaima gigas has quickly become a species of special interest in fish-farming. A problem related to its fishery exploitation is that the genetic mechanisms that control the sexual differentiation in Arapaimas gigas are not known. The sexual maturation in Arapaima gigas occurs belatedly, around the third to fifth year of life, and sexual dimorphism is not a strong characteristic of the species. For more sustainable management, it is of paramount importance to seek an effective and non-invasive method to sexually differentiate juvenile individuals of Arapaima gigas. For this, the establishment of a molecular genetic markers related to sexual differentiation would be an advantageous tool. Previous analyses of the Arapaima gigas genome could not find statistically significant determining large genomic regions that are associated with the sex-determination system of these individuals. In This study, we proposed to make uncommon Bioinformatic approaches, that is not so usual, for the identification of genomic differences between individuals of the oppositesex, with the intention of identifying repetitive regions in excess or scarcity in one sex. For this purpose, we used genomic data from six adult representatives of Arapaima gigas, three males and three females, in addition to the reference genome of Pirarucu ID: 12404 deposited in NCBI. After these exploratory studies in the genome, we noticed the existence of k-mers that are represented differently among individuals of the opposite sex. We also identified 22 scaffolds containing haploidy in one sex and with the antagonistic scenario (absence of haploidy) in the other one. Additionally, we performed the identification of the microsatellite panel in Arapaima gigas was performed, where 95.485 microsatellites were found. The knowledge of these microsatellite regions is very important for the continuation of this work, as it enables their use as molecular markers of genomic regions, which would facilitate experimental techniques of isolation of sequences of interest, especially when associated with the portions of haploidy existing in only one of the sexes of rapaimagigas would facilitate experimental techniques of isolation of sequences of interest. The Different proportions in the count of k-mers and heterozygous sites (haploidy) can indicate the existence of genetic factors, which if proven through experiments on the bench, can aid in the sexing of Arapaima gigas individuals.

4
  • FELIPE VIEIRA DA FONSECA
  • COMPARISON OF RESIDUE INTERACTION NETWORKS (RINs) TO ASSESS CONFORMATIONAL PROTEIN VARIATION

  • Advisor : JOAO PAULO MATOS SANTOS LIMA
  • COMMITTEE MEMBERS :
  • JOAO PAULO MATOS SANTOS LIMA
  • GUSTAVO ANTONIO DE SOUZA
  • RODRIGO MARANGUAPE SILVA DA CUNHA
  • Data: Jun 30, 2020


  • Show Abstract
  • Changes in the amino acid sequence may result in alterations in the three- dimensional protein structure, which may lead to partial or complete loss of function. One way to represent the chemical interactions between all amino acids in a protein is through the construction of residue interaction networks (RINs). In RINs, a graph represents the protein 3D structure, with the nodes as amino acid residues, and the edges as the physicochemical interactions between amino acids. We hypothesize that the comparison between RINs of the same protein in different conformations can be used to evaluate the effects of mutations and polymorphisms, as well as for the analysis and validation of theoretical protein models. Therefore, the present work aimed to build a tool to compare different RINs for a protein and to use such data to estimate conformational differences between proteins and also validate models generated by homology modeling. RINs were created using the RING 2.0 (Residue Interaction Network Generator) program. The tool developed for this purpose, called Comparator of Residue Interaction Networks (CoRINs), compares all RIN nodes generated from different structure files (PDBs) of the same protein, taking into account position, chain and residue, as well as their interactions with the other amino acids. The tool also presents a plot that estimates the variation of interactions formed by each residue, which we propose as an estimate for the conformational alterations of that protein site, from a set of compared PDBs. As a possible application for this tool, we used a dataset with oncogenes and tumor suppressor genes with their respective reported mutations mapped according to the connectivity deviation of each residue. Then we retrieved the different conformations for each resulting protein from a bank of structural conformers and constructed the RINs using the software RING 2.0 and compared them with CoRINs. The results show that mutations occurring in the tested oncogenes are more likely to occur in protein sites with a more significant deviation in the mean number of chemical interactions. Additionally, most of these genes’ mutations annotated as pathogenic and associated with clinical cancer cases occurred at sites with the most significant changes in chemical and physical interactions. These results demonstrate that the CoRINs tool can be useful in identifying non- covalent interactions essential for protein structure maintenance and in evolutionary studies, such as in the maintenance of homologous proteins function with high sequence divergence, as well as for the comparison and validation of theoretical structural models.

5
  • IGOR AUGUSTO BRANDÃO
  • Systems biology approaches in the investigation of articulation points in KEGG metabolic pathways

  • Advisor : RODRIGO JULIANI SIQUEIRA DALMOLIN
  • COMMITTEE MEMBERS :
  • RODRIGO JULIANI SIQUEIRA DALMOLIN
  • CESAR RENNO COSTA
  • RICARDO D''''OLIVEIRA ALBANUS
  • Data: Aug 14, 2020


  • Show Abstract
  • The study of proteins essentiality through laboratory methods is expensive, time-consuming and not scalable for large amounts of proteins. Besides, it is relevant to evaluate the essentiality of several proteins of a metabolic pathway as a whole. The metabolic pathways can be analyzed as graphs, which provide several tools to study the topological features such as the articulation points. Nowadays, research in bioinformatics studies the essentiality of proteins based on betweenness and degree metrics, however, graph theory suggests that articulation points could be essential nodes in a network. It remains to be determined whether these articulation points are essential in metabolic pathways and their topological impact on the network. Using network analysis via metrics and biologic curation, we aim to verify if bottlenecks are proteins with the highest frequencies and located in the center of KEGG metabolic pathways. For this purpose, we identified the articulation points in different networks, evaluate the impact of each articulation point, calculate their frequency and compare them with occurrences of non-articulation points. We consulted KEGG pathways available as KGML files. After, the data was transformed into a graph object. Two centrality parameters including articulation points and degree are determined and the essential proteins based on these parameters are classified. Approximately 20% of the proteins are articulation points. The articulation points with high- frequency which are located in central regions of the network were considered the most important (3.75%). In addition, the highest concentration of articulation points occurred in the frequency range of 80-90%. A pattern of non-randomness of articulation points was identified in the protein groups that have a frequency of at least 74.5%. Finally, steroid biosynthesis is the metabolic pathway with the highest number of articulation points with frequency higher than 80%. Besides, oxidoreductase is the articulation point class present in the highest number of metabolic pathways. Overall, the findings suggest that bottlenecks are articulation points with highest frequencies and located in the center of the network. It remains to perform a deep analysis on the articulation points biological roles.

6
  • DANILO LOPES MARTINS
  • Exploratory analysis of Arapaima gigas transcriptome

  • Advisor : Jorge Estefano de Santana Souza
  • COMMITTEE MEMBERS :
  • Jorge Estefano de Santana Souza
  • RODRIGO JULIANI SIQUEIRA DALMOLIN
  • SIDNEY EMANUEL BATISTA DOS SANTOS
  • Data: Sep 29, 2020


  • Show Abstract
  • Arapaima gigas, known as pirarucu, is considered one of the largest freshwater fish in the world, with a notable interest in the aquaculture due to its particular biological characteristics, including its rapid growth in its early years. In recent years, despite the massive availability of data from sequencing projects, few have addressed the taxon that includes this species. The present study was developed aiming characterize the transcriptome of this species, through an exploratory transcriptional analysis and patterns of gene expression related to specific gene profiles, in addition to highlighting sex-specific genes. By cDNA sequencing of 12 different tissue samples from Arapaima gigas, a reference transcriptome was assembled with a genome-guided assembly strategy. The gene expression profiles of different male and female tissues of adult specimens were analyzed. Pipelines such as Hisat2, Braker2, Trinity, Diamond and mygene were used for the assembly and annotation of genes, as well as clusterProfiler and KEGG tools for functional enrichment analysis and animalTFDB for identifying transcription factors. In this study we highlighted a set of annotated genes which may be potential candidates to biotechnological products, as they are involved in individual tissue phenotypes, sexual dimorphism processes, and in regulation of process that can explain their unique morphological characteristics. This study can also substantially conduct further analysis.

Thesis
1
  • KATYANNA SALES BEZERRA
  • QUANTUM BIOCHEMICAL STUDY OF INTERACTIONS BETWEEN
    THE ANDROGENIC RECEPTOR, rRNA AND MCL-1 AND LIGANDS

  • Advisor : UMBERTO LAINO FULCO
  • COMMITTEE MEMBERS :
  • DOUGLAS SOARES GALVAO
  • EUDENILSON LINS DE ALBUQUERQUE
  • RODRIGO JULIANI SIQUEIRA DALMOLIN
  • UMBERTO LAINO FULCO
  • VALDER NOGUEIRA FREIRE
  • Data: Mar 24, 2020


  • Show Abstract
  • This thesis presents three researches carried out in the field of ab initio simulation, based on principles of Quantum Mechanics. The first study present the particularities of the interactions between the androgen receptor (AR) carrying a T877A mutation, which promotes promiscuity in the receptor, and two antagonist drugs cyproterone acetate and hydroxyflutamide (CPA and HFT) and an agonist compound (RLL). The interaction energies were obtained based on quantum chemistry methods based on Density Functional Theory (DFT) using the method Molecular Fragmentation with Conjugated Caps (MFCC). The results demonstrate the individual relevance between T877A-AR and the ligands, pointing out the main residues that make the interactions. The second study presents the analysis of the interaction between 16S ribosomal RNA and hygromycin B (hygB) is an aminoglycoside antibiotic that affects ribosomal translocation, using the MFCC strategy in light of the DFT and parameterization of dielectric constants. The results showed that nucleotides C1403, C1404, G1405, A1493, G1494, U1495, C1496 and U1498 had the most negative binding energies, making them strong candidates for stabilizing hygB in a suitable binding pouch of the 30S ribosomal subunit of prokaryotes. The third work presented here investigates the interactions between the anti-apoptotic protein MCL-1, which overexpression has the ability to block the apoptosis signaling pathway allowing for disordered cell growth, and seven chemical compounds with the potential to inhibit the protein . The methodology used here also uses quantum methods based on DFT, in addition to MFCC. The results showed that the residues Arg263, Met231, Val253 Phe270, Phe228, Phe254, Leu267 and Thr266 are of crucial importance for the binding of inhibitors to the hydrophobic pocket of MCL-1. The computational methods used in the three studies emerge as an elegant and efficient alternative for drug development.

2
  • FREDERICO LEMOS DOS SANTOS
  • PROCESSO EPIDÊMICO MEDIADO POR VETORES E PROCESSO NO MODELO SIS EM REDE COMPLEXA: UM ESTUDO DAS PROPRIEDADES CRÍTICAS

  • Advisor : UMBERTO LAINO FULCO
  • COMMITTEE MEMBERS :
  • UMBERTO LAINO FULCO
  • JOAO PAULO MATOS SANTOS LIMA
  • ANTONIO DE MACEDO FILHO
  • MAURICIO LOPES DE ALMEIDA
  • PAULO HENRIQUE RIBEIRO BARBOSA
  • Data: Aug 19, 2020


  • Show Abstract
  • Since 1990, epidemic spread has been the subject of many studies based on sta- tistical physics methods. The dynamics of these epidemic processes, typically of non- equilibrium, consist of competition for active (infected hosts) and inactive (uninfected hopedeiro) health status. The transition between these active (epidemic) and inactive (non-epidemic) states allows the analysis of the critical point and exponents of the sys- tem (universality class). In this thesis, the critical properties of two epidemic systems are investigated: The first compound of two population species that are human with uninfec- ted hosts (H) and infected hosts (Hi) and that of vectors composed of non-infected vectors infected (V ) and infected vectors (Vi), which spread independently in a one-dimensional network, at D rates, following a dynamic probability rule, where the cure rates of vectors and individuals are respectively φ and λ. A second epidemic system, known as suscep- tible infected susceptible (SIS), in a complex network with high aggregation factor and contamination rate λ. For both models, computer simulations are used using the Monte Carlo Method to obtain the data and perform a finite-size scale analysis to estimate cri- tical properties. The conclusion of this work is the analysis of critical points and critical exponents. It is expected to define a new class of universality and a parallel with the methodology used by epidemiology to combat infectious diseases.

3
  • EDUARDO NOGUEIRA CUNHA
  • A low-cost smart system for electrophoresis-based nucleic acids detection at the visible spectrum

  • Advisor : JOAO PAULO MATOS SANTOS LIMA
  • COMMITTEE MEMBERS :
  • ADRIAO DUARTE DORIA NETO
  • ALEXSANDRO SOBREIRA GALDINO
  • DANIEL CARLOS FERREIRA LANZA
  • JOAO PAULO MATOS SANTOS LIMA
  • MARCELO AUGUSTO COSTA FERNANDES
  • RODRIGO MARANGUAPE SILVA DA CUNHA
  • Data: Nov 20, 2020


  • Show Abstract
  • Nucleic acid detection by electrophoresis is still a quick and accessible technique for many diagnosis methods, primarily at research laboratories or at the point of care units. Standard protocols detect DNA/RNA molecules through specific bound chemical dyes using a UV-transilluminator or UV-photo documentation system. However, the acquisition costs and availability of these devices, mainly the ones with photography and internet connection capabilities, can be prohibitive, especially in developing countries public health units. Also, ultraviolet radiation is a common additional risk factor to professionals that use electrophoresis-based nucleic acid detection. With that in mind, this work describes the development of a low-cost DNA/RNA detection smart system capable of obtaining qualitative and semi-quantitative data from gel analysis. The proposed device explores the visible light absorption range of commonly used DNA/RNA dyes using readily available parts, and simple manufacturing processes, such as light-emitting diodes (LEDs) and 3D impression. By applying IoT techniques, our system covers a wide range of color spectrum in order to detect bands from various commercially used dyes, using Bluetooth communication and a smartphone for hardware control, image capturing, and sharing. The project also enables process scalability and has low manufacturing and maintenance costs. The use of LEDs at the visible spectrum can achieve very reproducible images, providing a high potential for rapid and point-of-care diagnostics as well as applications in several fields such as healthcare, agriculture, and aquaculture.

2019
Dissertations
1
  • PAULO EDUARDO TOSCANO SOARES
  • Metagenome of a Pennaeus vannamei shrimp infected with the White Spot Syndrome Virus

  • Advisor : DANIEL CARLOS FERREIRA LANZA
  • COMMITTEE MEMBERS :
  • DANIEL CARLOS FERREIRA LANZA
  • Jorge Estefano de Santana Souza
  • ANDRE MAURICIO RIBEIRO DOS SANTOS
  • Data: Mar 11, 2019


  • Show Abstract
  • White-leg shrimp (Penaeus vannamei) is the most widely cultivated species in
    aquaculture in the world. Commercial cultivation usually occurs at high densities, which
    favors the selection of virulent pathogens, causing epidemic outbreaks. Among the
    pathogens that cause shingles, the virus that causes White Spot Syndrome Virus
    (WSSV) is known for outbreaks that can result in more than 80% of mortality in less
    than a week. As a result, the use of preventive strategies that allow the identification and
    monitoring of microbiota in crops has become increasingly necessary, especially in
    intensive systems. Recently, the use of metagenomics has been suggested for
    monitoring in aquaculture. Several studies have used 16S metagenomics to study the
    microbiota associated with healthy or infected shrimp with specific pathogens. Other
    studies have addressed the metagenomic shotgun to discover new viruses. The
    metagenomic shotgun is potentially more informative than the metagenomic by marker
    genes, allowing the retrieval of genomic information from the host and its symbionts,
    including viruses, whose composition may act as bioindicators of the disease stage. In
    this study, the shotgun metagenomic was used to analyze the caudal muscle of a P.
    vannamei specimen infected by WSSV. Taxonomic and functional classifications were
    made to obtain the respective profiles of the metagenomic data. P. vannamei and WSSV
    were the most abundant organisms in the classification by reads. In the analysis of the
    contigs, greater abundance of contigs was observed for shrimp, bacteria and WSSV,
    respectively. Functional classification was performed using the MEGAN software and
    resulted in few representative groups of protein functions, which were not sufficient to
    establish a functional profile of the sample. A taxonomic classification from the
    BLASTx was also performed with the MEGAN and presented results similar to the
    classification using BLASTn. The BLASTn results enabled the assembly of the
    complete mitochondrial genome of P. vannamei. This study provides support for the use
    of the shotgun metagenomics as a tool for the monitoring of the microbiota in shrimp
    cultures, and it is possible to simultaneously retrieve information useful for population
    genetics (through the mitochondrial shrimp genome) and the monitoring of symbionts
    and pathogens , such as bacteria and WSSV.

2
  • ANA CAROLINA MIRANDA FERNANDES COÊLHO
  • neoANT-HILL: an integrated tool for identification of potential neoantigens

  • Advisor : SANDRO JOSE DE SOUZA
  • COMMITTEE MEMBERS :
  • Jorge Estefano de Santana Souza
  • SANDRO JOSE DE SOUZA
  • ÂNDREA KELY CAMPOS RIBEIRO DOS SANTOS
  • Data: Apr 18, 2019


  • Show Abstract
  • In recent years, neoantigens have generated great interest in immunotherapy due to its ability to elicit antitumor immune responses. Neoantigens arise from specific somatic mutations and it can be present by HLA molecules on the surface of tumor cells and recognized by T cells as non-self molecules. Several studies have indicated promising results in the use of neoantigens in different immunotherapeutic approaches. However, the precise identification of neoantigens remains challenging. Therefore, the aim of the present work was developing a computational tool that integrates the individual immunogenetics analyses, which are fundamental for the identification of potential neoantigens. RNA-seq data from GEUVADIS project and melanoma mutation data obtained from the TCGA to validate the developed pipeline. As a result, we developed a tool, called neoANT-HILL, in Python programming language and available through a friendly and interactive graphical user interface. Data from the whole genome or exome sequencing and/or RNA-Seq data are used for performing the immunogenomic analyzes. The integration of the results allows the identification of potential neoantigens candidates for immunotherapy.

3
  • PEDRO IGOR CÂMARA DE OLIVEIRA
  • Planning new Trypanosoma cruzi CYP51 inhibitors using QSAR studies

  • Advisor : EUZEBIO GUIMARAES BARBOSA
  • COMMITTEE MEMBERS :
  • EUZEBIO GUIMARAES BARBOSA
  • MARCUS TULLIUS SCOTTI
  • PAULO MARCOS DA MATTA GUEDES
  • Data: Jun 7, 2019


  • Show Abstract
  • Chagas disease kills over 10,000 people per year and approximately 8 million people are infected by Trypanosoma cruzi. The reference drug for treatment of the disease, benznidazole, is the same since the 70s. In recent years, many CYP51 inhibitors were tested against this parasite’s target. One of them, posaconazole, was even tested in clinical trials that unfortunately were not successful. Nevertheless, there are still many evidences that CYP51 is a great potential target to treat T. cruzi infection.  The research for new effective molecules that can cure the chronic phase of the disease is essential. 2D and 3D-Quantitative Structure Activity Relationship (QSAR) studies were conducted in this work to create three QSAR models using the chemical structures of 197 published compounds that already went through either in vivo or in vitro tests. After the analysis of the models, new analogues not yet synthesized were suggested here and had their biological activity and synthetic availability assessed. 

4
  • TAYNÁ DA SILVA FIÚZA
  • In silico Investigation of epitopes from Mycobacterium avium subsp. hominissuis strains as vaccine candidates

  • Advisor : GUSTAVO ANTONIO DE SOUZA
  • COMMITTEE MEMBERS :
  • GUSTAVO ANTONIO DE SOUZA
  • TETSU SAKAMOTO
  • HELENA PAULA BRENTANI
  • Data: Dec 4, 2019


  • Show Abstract
  • Non-tuberculous mycobacteria are environmental mycobacteria responsible for a growing number of systemic and respiratory infections affecting mostly children, elders and immunocompromised individuals. The Mycobacterium avium Complex comprises Mycobacterium aviumas well as M. intracellulare and the major responsible for the reported cases to this day. M. aviumhas been recently classified as containing four subspecies with different infectivities as well as different hosts. One of those subspecies, Mycobacterium aviumsubsp. hominissuis has been isolated from humans and swines, whereas other varieties are found in cattle, birds and wild animals. To this moment, MAC infections are controlled with the use of multiple antibiotics through long, expensive and sometimes inefficient treatment regimens. The identification of effective targets for controlling such organisms is an essential and challenging task as surface proteins, which are key target molecules in several successful immunotherapies, are difficult to isolate. In addition, the design of immunotherapies and vaccine formulations depends on the identification of peptides of immunological interest which are usually found through repetitive and expensive experimental protocols. In this study applied computational tools to investigate surface proteins with exposed and ubiquitous immunogenic portions to strains of Mycobacterium avium subsp. hominissuis. To achieve that, 32648 amino acid sequences obtained from the NCBI database for Mycobacterium aviumsubsp. hominissuis were submitted to TMHMM for detection of alpha-helix transmembane domain, which were present in 3426 of those sequences. These proteins were clustered in 577 groups by CMG Biotools according to their homology as to identify membrane proteins common to all the organisms of interest. Those sequences were then submitted to available methods obtained at IEDB to classify their affinity to a list of 27 MHC alleles frequent in human populations. Peptides with the highest predicted immunogenicities were selected and 112 clusters with core proteins and high MHC affinities were selected. Crossing information between IEDB and TMHMM allowed for the selection of the 58 clusters in which at least one peptide was predicted to be placed on the outer portion of membrane. We also calculated peptide A. conservation (their presence in different strains), where 60% of clusters are formed by ubiquous peptides and B. promiscuity (the number of distinct MHCs to which they bind), where only a single cluster has a peptide that binds to four distinct MHCs with high affinities. As for vaccine epitope candidates, a minimum set with nine peptides of high binding affinity to the highest possible number of distinct MHCs were selected, interacting with 15 molecules. None of those nine sequences showed potential to cross-react with human or swine proteins. The protocol executed for this work can be applied to other organisms as means to identify possible vaccine application candidates.

5
  • RAUL MAIA FALCÃO
  • AUTOSOMAL ALPORT: A STUDY OF TWO NORTE-RIO-GRANDENSE FAMILIES

  • Advisor : Jorge Estefano de Santana Souza
  • COMMITTEE MEMBERS :
  • Jorge Estefano de Santana Souza
  • SELMA MARIA BEZERRA JERONIMO
  • VALDIR BALBINO
  • Data: Dec 19, 2019


  • Show Abstract
  • Alport syndrome (AS) is a genetically rare, heterogeneous and hereditary pathology associated with germline mutations in collagen type IV genes (COL4A3, COL4A4 and COL4A5). Characterized by progressive loss of renal function, hearing and eye damage during early childhood, the progression of the disease progresses to a terminal renal disease often associated with renal failure. Studies aimed at early diagnosing individuals with this nephropathy may lead to appropriate treatment and thus improve life expectancy. Efforts are currently underway, focused on the genome of patients, to create diagnostic tests for rare diseases/syndromes. From this perspective, mutations, genes and metabolic pathways involved with the pathology is crucial to understanding the complexity of these diseases. Thinking about corroborating the findings and studies about AS, the exome sequencing of two families from Rio Grande do Norte (RN), both composed of 4 individuals, was performed. Through the GATK and VARSCAN2 software, variants were called followed by a screening of deleterious variants identified by an in house script. The results pointed to two deleterious variants in the genes that form the type IV collagen α3 and α4 chains (a stop codon in COL4A3 and frameshift in COL4A4) leading to premature protein truncation. Both variants were detected in homozygous state in the probands and heterozygous in the other family members. Additionally, a broad region of runs of homozigosity (ROH) involving the COL4A3 and COL4A4 genes was detected in both probands of both families. According to the findings of deleterious variants in the COL4A3 and COL4A4 genes in ROH regions, these variants are now related to SA so that similar observations can serve as support for possible targets in the creation of new diagnostic tests and for the service of Genetic Counseling.

6
  • THIAGO DANTAS SOARES
  • BIO-DIA: WEB-BASED TOOL FOR DATA AND ALGORITHMS INTEGRATION

  • Advisor : WILFREDO BLANCO FIGUEROLA
  • COMMITTEE MEMBERS :
  • ALBERTO SIGNORETTI
  • RODRIGO JULIANI SIQUEIRA DALMOLIN
  • WILFREDO BLANCO FIGUEROLA
  • Data: Dec 19, 2019


  • Show Abstract
  • Data science is becoming a difficult field to work, not only because the huge amount of data and its variety of formats; also because the needs of collaboration of several specialists in order to retrieve valuable information. In this context, we created Bio-DIA, an online software to build projects which are focused in the integration of data and algorithms. The results obtained in a project can be reused in other projects, without specific programing knowledge. The software was created with Angular in the front-end, Django in the back-end with Spark to handle the
    big-data problems like the variety of formatas, and to use the system the only requirement is to use an specific xml pattern. Bio-DIA application facilitated the collaboration among users, allowing researcher ́s groups to share data, scripts and information.

Thesis
1
  • CLOVIS FERREIRA DOS REIS
  • Systems Biology-Based Analysis of Transcriptional Data in Lead-Treated Human Neural Progenitor Cells

     

  • Advisor : RODRIGO JULIANI SIQUEIRA DALMOLIN
  • COMMITTEE MEMBERS :
  • BEATRIZ STRANSKY FERREIRA
  • DIEGO BONATTO
  • MATHEUS AUGUSTO DE BITTENCOURT PASQUALI
  • RODRIGO JULIANI SIQUEIRA DALMOLIN
  • VIVIANE SOUZA DO AMARAL
  • Data: Nov 1, 2019


  • Show Abstract
  • The consequences of lead poisoning are diverse and relevant to human health. Reaching all organ systems, it mainly afects the nervous system, with severe and irreversible implicatons of neurodevelopment, memory consolidaton, and learning processes in children. They interact with cellular components in many ways, afectng ion-binding proteins, transducton signaling proteins, transmembrane ion channels, and transcripton factors. If in one hand, the symptoms of lead poisoning are well known, on the other hand, we have a lack of the systemic efects and its impact on neuronal cell transcripton modulaton. In order to investgate such efects from a systems biology perspectve, we applied the transcriptogramer R/Bioconductor package pipeline to evaluate the transcriptonal profle of lead acetate- treated human neural progenitor cells (NPCs) 30μM for 26 days. The transcriptogramer algorithm is designed to identfy functonally associated and diferentally expressed gene groups in case-control experiments in an unsupervised way. It was able to identfy eleven diferentally expressed clusters between days 3 and 11 of the lead treatment. Of these, seven presented negatve regulaton of several cellular systems involved in cell diferentaton, such as cytoskeleton organizaton, RNA and protein biosynthesis, characterized by large and tghtly connected networks. The four clusters that were positvely regulated presented sparse and poorly connected nodes, mainly related to transcripton, transmembrane transport, and signal transducton. In the subsequent period, involving days 12 to 26 of treatment, it was possible to observe a massive alteraton of the cellular transcripton profle with interference in all layers of gene expression regulaton. Thus, our results suggest that lead induces signifcant transcriptonal modifcatons in NPCs which can be correlated to damage and/or adaptatons of various systems, all resultng from intoxicaton by this heavy metal, thus influencing the result of ES-NP cell diferentaton.

2
  • BRUNO MATTOS SILVA WANDERLEY
  • flowDiv: A New Pipeline for Analyzing Flow Cytometric Diversity

  • Advisor : ADRIAO DUARTE DORIA NETO
  • COMMITTEE MEMBERS :
  • ADRIAO DUARTE DORIA NETO
  • DANIEL SABINO AMORIM DE ARAUJO
  • Jorge Estefano de Santana Souza
  • ANDRE MEGALI AMADO
  • FERNANDO UNREIN
  • ROSEMBERG FERNANDES DE MENEZES
  • Data: Nov 25, 2019


  • Show Abstract
  • Flow cytometry (FCM) is an analytical technique based on the spectroscopic characterization of particulates. This technique allows the quantitative and qualitative description of a wide range of cellular systems within seconds and at relatively low costs. Such features make it a very ubiquitous tool in both industrial and academic analytical protocols. The environmental sciences have been dealing with quite obvious obstacles with regrads to the structuring of FCM protocols: the highly heterogeneous nature of environmental samples makes it difficult to adjust protocols that balance standard mathematical reasoning and the intrinsic biological meanings of the system under study. Several approaches have been devised to correct these incongruities, including those that explore the idea of cytometric diversity - the study of FCM data based on numerical ecology methods - has been quite auspicious. However, despite the availability of solutions, many technical challenges still need to be overcome. In this work, we develop and apply a new computational tool, flowDiv, specially designed for the analysis of cytometric diversity of environmental data. Here, in addition to detailing the logic behind the method and comparing it to similar computational strategies, we apply it to real problems, revealing how some important ecological factors, such as nutritional status, affect the cytometric diversity of microbial groups in natural lakes at Patagonian Argentina and northeast Brazil.

3
  • VANDECLECIO LIRA DA SILVA
  • Bioinformatics applied on identification of  cancer/testis genes and their association with prognosis in a pan-cancer analysis.

  • Advisor : SANDRO JOSE DE SOUZA
  • COMMITTEE MEMBERS :
  • RODRIGO JULIANI SIQUEIRA DALMOLIN
  • SANDRO JOSE DE SOUZA
  • SIDNEY EMANUEL BATISTA DOS SANTOS
  • TIRZAH BRAZ PETTA
  • ÂNDREA KELY CAMPOS RIBEIRO DOS SANTOS
  • Data: Dec 4, 2019


  • Show Abstract
  • Cancer/testis (CT) genes are excellent candidates for cancer immunotherapies because of their restrict expression in normal tissues and the capacity to elicit an immune response when expressed in tumor cells. In this study, we provide a genome-wide screen for CT genes with the identification of 745 putative CT genes. Comparison with a set of known CT genes shows that 201 new CT genes were identified. Integration of gene expression and clinical data led us to identify dozens of CT genes associated with either good or poor prognosis. For the CT genes related to good prognosis, we show that there is a direct relationship between CT gene expression and a signal for CD8+ cells infiltration for some tumor types, especially melanoma. In addition, we contextualized bioinformatics in a big data scenario.

2018
Dissertations
1
  • ELIONAI MOURA CORDEIRO
  • Autogating in Flow Cytometry Data using SVM Classifiers for Bacterioplankton Identification

  • Advisor : ADRIAO DUARTE DORIA NETO
  • COMMITTEE MEMBERS :
  • ADRIAO DUARTE DORIA NETO
  • ARAKEN DE MEDEIROS SANTOS
  • DANIEL SABINO AMORIM DE ARAUJO
  • Jorge Estefano de Santana Souza
  • Data: Mar 22, 2018


  • Show Abstract
  • This master tesis shows the results of a methodology proposal for bacterioplankton identification using a machine learning approach named SVM. Samples used were taken from 19 high elevated lakes located at Pyrenees Mountains. Samples generated 74 databases after been analyzed by a specialist to serve as input to the algorithm. We observed the viability of this method with 3.35% of error in identification. Furthermore, there is no isolated direct correlation between robustness of the prediction models and high complexity of the input data but, indeed, the algorithm settings, function cost and variables choice have an important role in the performance as well.

2
  • LUCAS FELIPE DA SILVA
  • TFAT: Transcription factor analysis through data integration and scalable metrics

  • Advisor : Jorge Estefano de Santana Souza
  • COMMITTEE MEMBERS :
  • Jorge Estefano de Santana Souza
  • RODRIGO JULIANI SIQUEIRA DALMOLIN
  • WILFREDO BLANCO FIGUEROLA
  • Data: Mar 28, 2018


  • Show Abstract
  • Currently there are several tools proposed for analysis of Transcription Factors (TF), such  as  TFCheckpoint,  JASPAR,  SSTAR,  GTRD,  Enrichr. However  none  of  these tools offers a complete experience in which the reliability of TF can be evaluated, that is,  if  in  fact  an  analyzed  protein  is  a  TF  and  its  association  with  the  target  gene. Numerous databases were built over time, all of them with very rich information, but the  intrinsic  complexity  of  the  data,  the  volume  of information,  problems  of  gene nomenclature  and  several  other  factors  meant  that  such  tools  did  not  offer  a complete spectrum of analysis . On the other hand,  to work with a large volume of data  requires  advanced  computer  skills.  However,  the  general  public  interested  in analyzing this data are professionals from the biological areas. Configuring itself as a barrier,  since  the  academic  formation  of  this  area  does  not  offer  in  its  curricular components  programming  disciplines.  Faced  with  this situation,  this  work  aims  to create  a  web  tool  exclusively  for  the  analysis  of TFs.  Containing  the  integration of different databases and a set of scripts to manipulate this information, along with the crucial parameters defined by the user in its analysis, Transcription Factor Analysis Tools (TFAT) was designed and developed. The core of this tool is the analysis to identify  the  key  TFs  in  the  modularization  of  gene  transcription,  that  is,  the enrichment of the regulatory TFs of a list of genessubmitted by the user, that through the  scripts  that  integrate  the  same,  consult  its  database,  identify  the  TFs  that  are associated  with  the  listed  genes  and  calculate  the  enrichment  p-value.  In  addition, the tool verifies TF reliability, makes available predictions, and converts items from a list to the Entrez Gene's GeneID or Symbol. Anotherfeature of this work is the use of TF reliability applied throughout the tool. This degree of reliability takes into account evidence from different databases, experiments, predictions and other characteristics of TFs. With a standard mode and a user-defined mode, this reliability feature allows for a full customization through filters in the queries and analysis control for the end user.

3
  • DANIEL GARCIA TEIXEIRA
  • A role for feedforward inhibition in regulating gamma-frequency oscillation induced by feedback inhibition

  • Advisor : CESAR RENNO COSTA
  • COMMITTEE MEMBERS :
  • CESAR RENNO COSTA
  • RODRIGO JULIANI SIQUEIRA DALMOLIN
  • RENAN CIPRIANO MOIOLI
  • WILFREDO BLANCO FIGUEROLA
  • Data: Mar 29, 2018


  • Show Abstract
  • Gamma oscillation is present in several areas of the brain, such as the hippocampus, playing an important mechanism for memory functioning. We found several models capable of explaining the  generation  of  the  gamma  oscillations  and  explain  their  two  functionalities,  that  of synchronously  grouping  the  synapses  of  the  neurons  and  of  selecting  which  neurons  must trigger  in  each  cycle  of  this  synchronism.  These  functionalities  impart  a  computational character  of  neural  processing  to  this  system,  such  as  the  separation  of  patterns  and  the formation of neural assemblies. However, the analysis of these existent models shows to be very sensitive to the variations of the cerebral activities, being strongly affected by variations and their layers of entrance, in order to appear not to have a good robustness, generating much variation of their frequency of exit, as in between these neurons. However, when considering an important part of the biological circuit not considered in previous studies, a fed-in inhibition network enabled us to create a new model. Based on the Izhikevich neuron model, we generated a new model with greater robustness to the variations in the input layer, as well as a reduced computational cost and proximity of the biological model. In the possession of this new model, it will be possible to create neural networks with greater capacity of neurons, with reduced computational cost, besides the possibility of analyzing the individual behavior in each neuron of the model.

4
  • THAÍS DE ALMEIDA RATIS RAMOS
  • Development and aplication of CORAZON: a normalization and clustering tool for genomic data

  • Advisor : JOSÉ MIGUEL ORTEGA
  • COMMITTEE MEMBERS :
  • GUSTAVO HENRIQUE ESTEVES
  • JOSÉ MIGUEL ORTEGA
  • RODRIGO JULIANI SIQUEIRA DALMOLIN
  • THAIS GAUDENCIO DO REGO
  • VINICIUS RAMOS HENRIQUES MARACAJA COUTINHO
  • Data: May 11, 2018


  • Show Abstract
  • The creation of gene expression encyclopedias possibilities the understanding of gene groups that are co-expressed in different tissues and comprehend gene clusters according to their functions and origin. Due to the huge amount of data generated in large-scale transcriptomics projects, an intense demand to use techniques provided by artificial intelligence became widely used in bioinformatics. Unsupervised learning is the machine learning task that analyzes the data provided and tries to determine if some objects can be grouped in some way, forming clusters. We developed an online tool called CORAZON (Correlation Analyses Zipper Online), which implements three unsupervised machine learning algorithms (mean shift, k-means and hierarchical) to cluster gene expression datasets, six normalization methodologies (Fragments Per Kilobase Million (FPKM), Transcripts Per Million (TPM), Counts per million (CPM), base-2 log, normalization by the sum of the instance's values and normalization by the highest attribute value for each instance), and a strategy to observe the attributes influence, all in a friendly environment. The algorithms performances were evaluated through five models commonly used to validate clustering methodologies, each one composed by fifty randomly generated datasets. The algorithms presented accuracies ranging between 92-100%. Next, we applied our tool to cluster tissues, obtain gene’s evolutionarily knowledgement and functional insights, based on the Gene Ontology enrichment, and connect with transcription factors. To select the best number of clusters for k-means and hierarchical algorithms we used Bayesian information criterion (BIC), followed by the derivative of the discrete function and Silhouette. In the hierarchical, we adopted the Ward’s method. In total, we analyzed three databases (Uhlen, Encode and Fantom) and in relation to tissues we can observe groups related to glands, cardiac tissues, muscular tissues, tissues related to the reproductive system and in all three groups are observed with a single tissue, such as testis, brain and bone-narrow. In relation to the genes clusters, we obtained several clusters that have specificities in their functions: detection of stimulus involved in sensory perception, reproduction, synaptic signaling, nervous system, immunological system, system development, and metabolics. We also observed that clusters with more than 80% of noncodings, more than 40% of their coding genes are recents appearing in mammalian class and the minority are from eukaryota class. Otherwise, clusters with more than 90% of coding genes, have more than 40% of them appeared in eukaryota and the minority from mammalian. These results illustrate the potential of the methods in CORAZON tool, which can help in the large quantities analysis of genomic data, possibiliting the potential associations analyzes between noncoding RNAs and the biological processes of clustered together coding genes, as well as the possibility of evolutionary history study. CORAZON is freely available at http://biodados.icb.ufmg.br/corazon or http://corazon.integrativebioinformatics.me.

5
  • DIEGO ARTHUR DE AZEVEDO MORAIS
  • Transcriptogramer: R Package for Transcriptional Analysis

  • Advisor : RODRIGO JULIANI SIQUEIRA DALMOLIN
  • COMMITTEE MEMBERS :
  • Jorge Estefano de Santana Souza
  • MAURO ANTONIO ALVES CASTRO
  • RODRIGO JULIANI SIQUEIRA DALMOLIN
  • Data: Jun 29, 2018


  • Show Abstract
  • The transcriptogram, a method used on transcriptomes analysis, uses protein-protein interaction data to build an ordered gene list. On this list, genes are placed such that the probability of interaction between its products exponentially decreases with the increase of the distance between its positions. The ordered gene list is then used to calculate the average expression value of functionally associated genes in a window with settable radius, allowing the differential expression of non-predefined gene sets in case-control studies. This study aims to implement an R package that uses transcriptograms and integrates features from packages known by the scientific community, able to perform: differential expression, functional enrichment, and network visualization. The transcriptogramer package was implemented and is available at Bioconductor, a repository for open source softwares developed in the R language for use in bioinformatics. In a comparison between the transcriptogramer and a pipeline combining features from limma and topGO packages, was noticed that the transcriptogramer identified nearly 10 times more Gene Ontology terms significantly enriched, among which most of the terms identified by the conventional pipeline were found.

     

6
  • PAULO ROBERTO BRANCO LINS
  • Uncovering association networks through an eQTL analysis involving human miRNAs and lincRNAs

  • Advisor : JUNIOR BARRERA
  • COMMITTEE MEMBERS :
  • SANDRO JOSE DE SOUZA
  • WILFREDO BLANCO FIGUEROLA
  • GUILHERME SUAREZ KURTZ
  • Data: Jul 19, 2018


  • Show Abstract
  • Variations in the level of gene expression are among the main causes of phenotypic diversity in organisms, including the development of pathologies and response to drugs in humans. Non-coding RNAs (ncRNAs) play an important role in the complex mechanism of regulatory networks. Although not yet fully understood, two representatives of the ncRNAs emerge in recent researches as protagonists in the development of clinical conditions. They are the microRNAs (miRNAs) and the long intergenic non-coding RNAs (lincRNAs). Thus, the present work integrated public data to catalog the vast landscape of the regulatory effects of miRNAs and lincRNAs in the human genome. Through expression Quantitative Trait Loci (eQTL) analysis, variations that had a putative effect on gene expression were identified. Association networks were also created relating the eQTL analysis results to traits of clinical and/or pharmacological relevance. Through this, associations that may continue to arouse the interest of new studies involving the theme were revealed. Mental and coronary disorders, in addition to cancer, were the most evidenced traits in the study results.

7
  • KARLA CRISTINA TABOSA MACHADO
  • Development of computational approaches for prokaryote proteogenomics

  • Advisor : GUSTAVO ANTONIO DE SOUZA
  • COMMITTEE MEMBERS :
  • GUSTAVO ANTONIO DE SOUZA
  • JOAO PAULO MATOS SANTOS LIMA
  • LUCIANO FERNANDES HUERGO
  • Data: Jul 27, 2018


  • Show Abstract
  • Next-generation sequencers development cause a revolution in genomic research, and nowadays the complete genomic information of thousands of bacterial strains is available. Similar technological breakthroughs also happened for protein analysis by mass spectrometry (MS) in the last decade regarding sensitivity and throughput. However, proteomics is yet to reach the same level of throughput of genomics, but for samples from simple eukaryotic organisms such as yeasts or bacteria, proteomics is able to detect and quantify their proteome close to completeness. There are still challenges regarding the characterization of coding regions in a genome, as well as in the validation of genomic models. Scientific reports show genomic annotation performed over the same genomic data using independent approaches resulted in divergent data regarding the number of predicted ORFs and also their length (i.e. different choices for transcription/translation initiation). Peptide sequence characterization in proteomics samples can be used to validate genomic regions as coding, research field known as proteogenomics. For such, the design of customized sequence databases which allows the identification of new genomic regions previously predicted to be no-coding and therefore absent in routinely employed databases. In this work, was developed a computational strategy that builds proteins sequence databases customized, through processing and analysis of protein sequence data from several strains of the same bacterial species. The approach identifies and compares homologous and uniquely annotated proteins in all strains, and reports those sequences in a non-redundant manner, which means, sequences extensively repeated among annotations are reported only once in order to keep the size search space under control. Databases also report sequence variations, whether they result from genetic variations or annotation divergences, which are usually abdicated in databases used in proteomic analysis. Besides the databases, there was also a concern to create a registration file, in which each observation regarding the presence of homologous, differences of sequences, modification type and presence in strains was well described. In order to evaluate if the generated databases produced relevant sequences and didn’t happen loss of information if compared to the used original sequences, MS data collected from clinical strains of Mycobacterium tuberculosis were submitted to protein identification. The database created with this approach was compared with a database formed by the mere concatenation of all the proteins annotated in M. tuberculosis. Besides reducing the computacional time, the number of identifications obtained in both searches was practically identical. Finally, databases for 10 bacterial species containing at least 65 strains characterized were created. When analyzing these databases, it was noticed that the greater is the diversity of the pangenome of the bacterial species, greater is the amount of proteins and peptides expected. The result also demonstrate the possibility to use such strategy to create databases containing sequence of multiple species, in the order to perform metaproteomic analyzes of MS data.

8
  • ARANTHYA HEVELLY DE LIMA COSTA
  • ENERGY ANALYSIS OF THE INTERACTION OF ESTRADIOL AND DIETILESTILBESTROL WITH ERα

  • Advisor : UMBERTO LAINO FULCO
  • COMMITTEE MEMBERS :
  • RODRIGO JULIANI SIQUEIRA DALMOLIN
  • UMBERTO LAINO FULCO
  • VALDER NOGUEIRA FREIRE
  • Data: Aug 10, 2018


  • Show Abstract
  • Breast cancer and a hormone-dependent disease, which has several different subtypes, patterns of gene expression and distinct manifestations (CHENG et al., 2002). According to the National Cancer Institute (INCA), in the year 2013, as deaths caused by the disease of 14,388, being 181 men and 14,207. The estimate for 2015 is 57,120 of new cases. Most breast cancers are ER + (estrogen receptor positive), ie, 17β-estradiol dependent. In this type of breast neoplasm, the number of ERα (estrogen receptor alpha subtype) is higher than  the number of ERβ (estrogen receptor beta subtype), evidencing the importance of the alpha subtype in this disease. The purpose of this work is to measure the individual binding  energies  of  ERα  residues  with  17β-estradiol  and  Diethylstilbestrol,  using  a computational simulation. For this purpose, it is employed as Doria of Functional Theory (DFT) and Molecular Fractionation Method with Conjugated Caps (MFCC). The results obtained with this work may help to characterize the interaction between the 17β-estradiol agonists and Diethylstilbestrol with ERα. The results obtained showed the residues with the most significant energy values are: GLU353, LEU391, MET343, LEU346, MET388, ARG394,  PHE404,  HIS524,  ASP411,  LEU525,  ARG352  and  ARG548. These  results help characterize, through the information obtained, an interaction between 17β-estradiol and Diethylstilbestrol with ERα and, in turn, can be used as a basis for studies, structural drug design, modulate existing drugs, such as for the design of new drugs.

9
  • PRISCILLA MACHADO DO NASCIMENTO
  • Implementation of Functions for a Platform of Genomic Variants Analysis

  • Advisor : Jorge Estefano de Santana Souza
  • COMMITTEE MEMBERS :
  • Jorge Estefano de Santana Souza
  • BEATRIZ STRANSKY FERREIRA
  • MATHEUS AUGUSTO DE BITTENCOURT PASQUALI
  • Data: Sep 21, 2018


  • Show Abstract
  • Current scientific advances in genomics have been provided due to extraction of significant information from the DNA using new technologies available for the analysis of genetic data. Precision medicine is based on these technological advances to better understand the genetic constitution and possible changes that may lead to diseases with patient-specific differential responses to treatments. Considering the process of genetic mutation as one of the drivers of evolution and with the goal to better understand its effects, the present work aims to contribute to future analysis of mutation data, helping in thefuture identification of new hotspots and SNPs. For this analysis, a software product was developed responsible for offering assistance to the collected data, in order to analyze them in an efficient way and to visualize them in a more precise way. This work proposes the implementation of new functionalities that can add more value to the aforementioned software, contributing directly to the automation and improvement of the processes performed by the variant analysis tools available in the market. Aiming at an applicability of what was developed, an analysis ofthe public data used to annotate the variants of the system was proposed. For this, a study will be carried out regarding the data of the existing predictors, so that the accuracy of the data can beverified in relation to the clinical data recorded in ClinVar. In order to extract data to demonstrate the relevance of the false positive/negative analysis presented through the existing predictors,a prototype process was proposed that aims to improve the accuracy of the SNPs identified by the system

10
  • MARCEL DA CÂMARA RIBEIRO DANTAS
  • Reverse engineering of Ewing Sarcoma regulatory network uncovers PAX7 and RUNX3 as master regulators associated with good prognosis.

  • Advisor : RODRIGO JULIANI SIQUEIRA DALMOLIN
  • COMMITTEE MEMBERS :
  • RODRIGO JULIANI SIQUEIRA DALMOLIN
  • CESAR RENNO COSTA
  • MATHEUS AUGUSTO DE BITTENCOURT PASQUALI
  • Data: Sep 21, 2018


  • Show Abstract
  • Ewing Sarcoma (ES) is a rare malignant bone tumor with high propensity to metastasize occurring most frequently in adolescents and young adults. There is no ES cell of origin identified só far and the hallmark of this cancer is the occurrence of a chromosomal translocation between the chromosomes 11 and 22 that results in an aberrant transcription factor through the fusion of a gene from FET family and ETS family, commonly EWSR1 and FLI1. The translocation is associated with chromatin alteration, leading to a significant disturbance in the cell transcriptome. The regulatory mechanisms behind the observed ES transcriptional alterations remain poorly understood. Here, we inferred the transcriptional regulatory network of Ewing Sarcoma and identified 7 transcription factors as potential master regulators. According to our results, these 7 master regulators are organized in two clusters: one composed by PAX7 and RUNX3 and other composed by ARNT2, CREB3L1, GLI3, MEF2C, and PBX3. The master regulators inside each cluster are agonists among each other andboth clusters show antagonism between them. Based on transcriptional data, we classified ES patients of two cohorts according to the activity of each of the seven regulons. High regulatory activity of PAX7 and RUNX3 is associated with better overall survival and high regulatory activity of ARNT2, CREB3L1, GLI3, and PBX3 is associated with worse overall survival. This work contributes to a better understanding of the regulome of Ewing Sarcoma, indicating putative master regulators that can lead to potential prognosis prediction and key factors of tumorigenesis.

11
  • STHEPHANIE NASSIF PINHEIRO
  • CHARACTERIZATION OF THE 18S RNA GENE IN PROTOZOARS OF THE APICOMPLEXA ROW: AN APPROACH APPLIED TO THE DESIGN OF MOLECULAR MARKERS

  • Advisor : DANIEL CARLOS FERREIRA LANZA
  • COMMITTEE MEMBERS :
  • DANIEL CARLOS FERREIRA LANZA
  • KATIA CASTANHO SCORTECCI
  • CLAUDIO BRUNO SILVA DE OLIVEIRA
  • Data: Sep 26, 2018


  • Show Abstract
  • The Apicomplexa phylum comprises protozoa of various genera causing parasitic diseases worldwide such as malaria, toxoplasmosis or opportunistic intestinal disorders. Nowadays, protozoa of medical importance are generally identified by light microscopy, which makes accurate classification difficult, makes diagnosis and prognosis difficult, particularly in cases where infection is low. In this context, the present work aimed to develop an alternative molecular method that allows the identification of a wide range of protozoa of the Apicomplexa taxa. Thus, a primer system was developed for use in a semi-nested PCR (Polymerase Chain Reaction) reaction. The investigated target for primer design was the 18S rDNA region, as it is a widely used template for screening and species identification in biodiversity studies. From the structural analysis and the ribosomal nucleic acid sequence, sets of primers that interact in conserved regions and flank variable regions of the gene were designed. The efficiency of each set of primers was evaluated by in silico PCR and the generated amplicons were evaluated. A set of primers was selected which, when used in a nested fashion, can generate ~ 166 amplicons with distinct sequences, which can be used to discriminate genera and species of the Apicomplexa taxa by difference in the size of amplicons generated in agarose gel and species by sequencing (Sanger method or Next Gen Sequencing). The proposed method was validated in vitro and its efficiency for identification of some protozoan species of medical interest was confirmed. After further validation steps this method can be used for initial screening in cases of suspected parasitosis and also for parasite species determination

12
  • LAISE CAVALCANTI FLORENTINO
  • Using RINs to understand cancer mutations: deleterious mutations are more commonly associated to highly connected amino acids.

  • Advisor : JOAO PAULO MATOS SANTOS LIMA
  • COMMITTEE MEMBERS :
  • JOAO PAULO MATOS SANTOS LIMA
  • Jorge Estefano de Santana Souza
  • VALDIR BALBINO
  • Data: Oct 31, 2018


  • Show Abstract
  • In the last decades, advances in whole genomic approaches lead to the identification of a vast number of cancer-related mutations. High-throughput estimations of the impacts of cancer mutations in the protein structure are not an easy accomplishment, and most studies are limited to one-by-one whole structural analyzes. Moreover, there are still many challenges on the way to the precise and automated prediction of pathogenic mutations. Therefore, understanding the structural impact of a particular amino acid change is of great importance for cancer medical research. However, most studies have been emphasizing sequences and structural modifications based on chemical characteristics of amino acids and not fold features, in which the conservation of non-covalent interactions play a significant role. Henceforth, in the present study, we used residue interaction networks (RINs) for large-scale analysis of cancer missense mutations in order to infer their effects on the conservation of non-covalent interactions. We hypothesize that changes in highly connected amino acids are more likely to cause deleterious mutations. To evaluate this, we retrieved cancer missense mutations from COSMIC (cancer.sanger.ac.uk/cosmic) and TCGA (cancergenome.nih.gov) databases and mapped them to their respective structures retrieved from Protein Data Bank (rcsb.org). Then, RINs were constructed from the obtained pdb files, and network parameters such as the node's degree, edges' type, clustering coefficient, betweenness weighted were assessed and plotted using R scripts. Later, we compared these results against reported missense single nucleotide polymorphisms retrieved from dbSNP (www.ncbi.nlm.nih.gov/projects/SNP/) and to pathogenic and non-pathogenic cancer mutations from ClinVar (www.ncbi.nlm.nih.gov/clinvar/) databases. Our results demonstrate that the distribution of mutations per degree (node connectivity) varies significantly compared to random Monte Carlo simulations and also to the distribution of a set of human single nucleotide polymorphisms (SNPs), tending to remain at nodes with lower connectivity. Besides, the proportion of deleterious mutations was significantly increased in nodes with a high degree of connectivity when two different criteria were used for their classification: proportions of software predictors (Ndamage) and clinical classification obtained from ClinVar. Taking into account these results, we can conclude that the changes in the highly connected amino acids are indeed more likely to generate deleterious mutations, due their higher proportion of occurrence in these nodes. Our results also indicate that the conservation of non-covalent interactions is an important parameter to consider in assessing mutations effects and RINs analyses can be used as an additional parameter to aid in the prediction of deleterious mutations in cancer.

13
  • CAYRO DE MACÊDO MENDES
  • IN SILICO CHARACTERIZATION OF VARIABLE ORFs AND REGULATORY REGIONS IN WHITE SPOT SYNDROME VIRUS GENOME (WSSV)

  • Advisor : DANIEL CARLOS FERREIRA LANZA
  • COMMITTEE MEMBERS :
  • DANIEL CARLOS FERREIRA LANZA
  • EUZEBIO GUIMARAES BARBOSA
  • SÁVIO TORRES DE FARIAS
  • Data: Nov 19, 2018


  • Show Abstract
  • In silico characterization has been employed as a more accessible alternative for prediction of protein sequences that cannot be reproduced in vitro or have their structures crystallized, as well as can provide data that complement experimental approaches. The virus that causes white spot syndrome (WSSV) is one of the biggest problems facing global shrimp farming, causing considerable economic damage. Although the effects of the virus on the cultures are well known, to date there is little information on the mechanisms of viral infection and replication, mainly because much of their coding sequences do not show homology with known sequences. In addition, the WSSV genome has some coding regions that vary between the different isolates, which have not been functionally characterized to date, called ORF75, ORF94, ORF125, ORF23/24, ORF14/15. This work aimed at the in silico characterization of the putative proteins encoded by the variable regions of the WSSV genome, in order to identify possible functions. Phylogenetic analyzes were performed from the alignment of ten WSSV genomic sequences obtained from GenBank. The variable regions of the ORF75, ORF94 and ORF125 were aligned and the repeat units and SNPs annotated through Geneious platform. The amino acid sequences were subjected to remote homologous searches, motifs, conserved domains, fold recognition and prediction of secondary and tertiary structures. It was possible to model tertiary structures of protein domains and to infer possible functions that include an RNA recognition motif associated with post-transcriptional processes between positions 70-150 of wsv477 (ORF23), an Ankyrim repeat (ANK) motif acting in conjunction with RING-H2 domain on modulation of ubiquitin-dependent proteolysis in wsv249 (ORF125), repair helicases (wsv479, wsv497), actin filament polymerization associated protein (wsv463a), and a HA2 subunit of influenza virus hemagglutinin (wsv492). It has also been possible to detect signatures associated with nuclear localization signals within the repeating units of the amino acid sequences encoded by ORF75 and ORF94 which may be involved in the emission of signals to host cell nucleating proteins. We performed the analysis of some regulatory regions 100 and 200nt upstream of the coding regions and it was possible to detect some motifs, including a Zinc-Finger binding site, suggesting the interaction between possible transcription factors. By means of these results an action model was proposed for each one of the proteins studied.

14
  • THAYNÃ NHAARA OLIVEIRA DAMASCENO
  • All purpose word pairing tool: Easy interaction networks for clinical data.

  • Advisor : EUZEBIO GUIMARAES BARBOSA
  • COMMITTEE MEMBERS :
  • EUZEBIO GUIMARAES BARBOSA
  • GILDERLANIO SANTANA DE ARAÚJO
  • RAND RANDALL MARTINS
  • TETSU SAKAMOTO
  • Data: Dec 18, 2018


  • Show Abstract
  • Big Data is a term used to characterize the growing volume of existing data on different topics, whether they are biomedical or not. The enormous volume of biological and biomedical data generated daily, one of the main barriers will be an analysis of these data. The development and use of computational tools that allow the analysis of data through techniques such as Text Mining. Text Mining, a Data Mining strand, can be defined as a method that allows the extraction of relevant information contained in text. In order to allow a differentiated analysis of the data, whether these clinical data or not, a simple algorithm was developed, which allows the analysis of this data without the need of correlation with existing databases, nor the creation of new databases. From this algorithm, a WEB tool was developed so that anyone can access the algorithm (even without the knowledge of computational techniques) and promote the analysis of their data. The Integrate Paired Tool (IPT) algorithm was written in R programming language and uses Data Mining and Text Mining techniques for analyzing clinical data, not restricting its analyzes only to these specific data. IPT promotes pairing of terms by analyzing the existing frequency between data pairs, from a user-supplied .csv file. In addition, the WEB tool was developed from the languages JavaScript, HTML5, CSS and PHP. The algorithm reads the .csv file and pass through it by pairing its terms two by two, regardless of whether the columns are different sizes or incomplete until all columns are paired. After all the groupings, a value is assigned to each grouped pair, adding all pairs with the same frequencies and generating another .csv file containing the existing interactions and their respective frequencies. After the relations and their appearance frequencies are formed, a graph of interactions (in R) is shown on the WEB tool screen, so the user can do their analyzes, in addition to the .csv file with all interactions and frequencies. This graph and this table can contain variable information, depending on the percentage that the user chooses in the IPT tool. This .csv file with interaction and frequency data can be used by the user in other network visualization tools, such as Gephi, for example. For the purposes of tool testing, a data from a neonatal was used. The IPT proved to work well and reached the objectives of the research, and as future goals, we will have the hosting of the tool in the page of the Program of Postgraduate in Bioformtics of UFRN, the analysis of other data and a possible integration of the pre-processing of the data within the IPT itself.

Thesis
1
  • ANDRÉ LUÍS FONSECA FAUSTINO
  • Bioinformatics applied to oncology: Studies in the prospection of therapeutic targets, tumor antigens and in the dynamics of drug resistance.

  • Advisor : SANDRO JOSE DE SOUZA
  • COMMITTEE MEMBERS :
  • SANDRO JOSE DE SOUZA
  • GUSTAVO ANTONIO DE SOUZA
  • LUCYMARA FASSARELLA AGNEZ LIMA
  • DIRCE MARIA CARRARO
  • VALDIR BALBINO
  • Data: Nov 1, 2018


  • Show Abstract
  • Cancer research is a field with several branches, which covers the understanding of how the tumor heterogeneity can be used as a treatment opportunity or how those alterations led poor prognosis and drug resistance. In this context, the bioinformatics rises as a tool to investigate which features could be used as a therapeutical strategy. In this thesis, we presented three chapters that address distinct aspects in the cancer research, such as i) the prospection of therapeutic targets, ii) identification of possible tumor antigens; iii) understanding mechanisms associated with drug resistance. In the first chapter, shown a catalog of cell surface proteins, herein called the surfaceome. The cell surface proteins represent attractive targets for therapy due to the essential role in signaling pathways and often dysregulation in cancer. The surfaceome catalog includes 3758 proteins, which were categorized based on genetic alterations types and the influence in short-term survival in several tumors. Furthermore, we investigate gene signatures and their association with survival rate. As result, three genes (WNT5A, CNGA2, and IGSF9B) were proposed as a poor prognosis in breast cancer patients. The second chapter, it is focused on data derived from a previous article, published in 2017. Briefly, the original publication was associated with the identification of cancer-testis antigens (CTAs) and relation with prognosis in several tumor types. On the other hand, in this chapter, we present new putative tumor antigens from a genome-wide analysis. Next, we discussed strategies to prioritize cases and remove spurious results. In addition, we purpose CTAs combinations as a strategy to increase the effectiveness in anticancer vaccines development. As result, were found significant combinations among HEATR9, INSL3, GTSF1L, and HSF5, which cover in average 35% of patients. Finally, the third chapter discusses a work in progress, which involves proteins associated with post-transcriptional regulation and how those proteins affect anticancer drug response. In particular, our findings suggest an interesting discussion about RBPs (RNA-Binding proteins) expression and response to anticancer drugs. Also, were compared RBPs findings with other transcriptional-related genes, such as transcriptional factors and lincRNAs. In conclusion, this thesis considers three fundamental aspects of cancer research, especially in the development of our treatment and diagnosis strategies. Furthermore, two of these chapters are supported by international publications.

2017
Dissertations
1
  • IARA DANTAS DE SOUZA
  • LEAD POISONING METABOLIC MAP

  • Advisor : RODRIGO JULIANI SIQUEIRA DALMOLIN
  • COMMITTEE MEMBERS :
  • JOAO PAULO MATOS SANTOS LIMA
  • MAURO ANTONIO ALVES CASTRO
  • RODRIGO JULIANI SIQUEIRA DALMOLIN
  • VIVIANE SOUZA DO AMARAL
  • Data: Dec 14, 2017


  • Show Abstract
  • Lead is an important heavy metal used worldwide in several applications, especially in industry. People exposed to lead can develop a wide range of symptoms associated with lead poisoning. Many effects of lead poisoningwere reported in the literature, showing a compromising of whole body health, with symptoms related to cardiovascular, immune, bone, reproductive, hematological, renal, gastrointestinal, and nervous system. However, the molecular lead targets as wellas the pathways affected by lead poisoning are not completely described. The aim of this study was to construct a map of metabolic pathways impaired in lead poisoning byevaluating which biomolecules are directly affected by lead. Through manual literature curation, we identified proteins which physically interact with lead and subsequently determined the metabolic pathways those proteins are involved with. At total, weidentified 23 proteins involved with heme synthesis, calcium metabolism, neurotransmission, among other biological systems, which helps to understand the wide range of lead poisoning symptoms.

SIGAA | Superintendência de Tecnologia da Informação - (84) 3342 2210 | Copyright © 2006-2024 - UFRN - sigaa07-producao.info.ufrn.br.sigaa07-producao