Banca de DEFESA: KARLA CRISTINA TABOSA MACHADO

Uma banca de DEFESA de MESTRADO foi cadastrada pelo programa.
DISCENTE : KARLA CRISTINA TABOSA MACHADO
DATA : 27/07/2018
HORA: 09:00
LOCAL: Sala B203 IMD
TÍTULO:

Development of computational approaches for prokaryote proteogenomics


PALAVRAS-CHAVES:

Proteomics, proteogenomics, mass spectrometry, prokaryotes, databases


PÁGINAS: 60
GRANDE ÁREA: Ciências Exatas e da Terra
ÁREA: Ciência da Computação
RESUMO:

Next-generation sequencers development cause a revolution in genomic research, and nowadays the complete genomic information of thousands of bacterial strains is available. Similar technological breakthroughs also happened for protein analysis by mass spectrometry (MS) in the last decade regarding sensitivity and throughput. However, proteomics is yet to reach the same level of throughput of genomics, but for samples from simple eukaryotic organisms such as yeasts or bacteria, proteomics is able to detect and quantify their proteome close to completeness. There are still challenges regarding the characterization of coding regions in a genome, as well as in the validation of genomic models. Scientific reports show genomic annotation performed over the same genomic data using independent approaches resulted in divergent data regarding the number of predicted ORFs and also their length (i.e. different choices for transcription/translation initiation). Peptide sequence characterization in proteomics samples can be used to validate genomic regions as coding, research field known as proteogenomics. For such, the design of customized sequence databases which allows the identification of new genomic regions previously predicted to be no-coding and therefore absent in routinely employed databases. In this work, was developed a computational strategy that builds proteins sequence databases customized, through processing and analysis of protein sequence data from several strains of the same bacterial species. The approach identifies and compares homologous and uniquely annotated proteins in all strains, and reports those sequences in a non-redundant manner, which means, sequences extensively repeated among annotations are reported only once in order to keep the size search space under control. Databases also report sequence variations, whether they result from genetic variations or annotation divergences, which are usually abdicated in databases used in proteomic analysis. Besides the databases, there was also a concern to create a registration file, in which each observation regarding the presence of homologous, differences of sequences, modification type and presence in strains was well described. In order to evaluate if the generated databases produced relevant sequences and didn’t happen loss of information if compared to the used original sequences, MS data collected from clinical strains of Mycobacterium tuberculosis were submitted to protein identification. The database created with this approach was compared with a database formed by the mere concatenation of all the proteins annotated in M. tuberculosis. Besides reducing the computacional time, the number of identifications obtained in both searches was practically identical. Finally, databases for 10 bacterial species containing at least 65 strains characterized were created. When analyzing these databases, it was noticed that the greater is the diversity of the pangenome of the bacterial species, greater is the amount of proteins and peptides expected. The result also demonstrate the possibility to use such strategy to create databases containing sequence of multiple species, in the order to perform metaproteomic analyzes of MS data.


MEMBROS DA BANCA:
Presidente - 2267860 - GUSTAVO ANTONIO DE SOUZA
Interno - 1513597 - JOAO PAULO MATOS SANTOS LIMA
Externo à Instituição - LUCIANO FERNANDES HUERGO - UFPR
Notícia cadastrada em: 17/07/2018 16:16
SIGAA | Superintendência de Tecnologia da Informação - (84) 3342 2210 | Copyright © 2006-2024 - UFRN - sigaa10-producao.info.ufrn.br.sigaa10-producao