Framework for expanding plasticity and rooting evolutionary analyses through different orthology databases
Orthology; phylogenetic trees; evolutionary plasticity; evolutionary rooting; package building; R language
Methods for reconstructing evolutionary scenarios are important tools for understanding biological systems under the perspective of its origins. The primary concepts behind these techniques lie in the relations between genes from different species forming gene families, or orthologous groups. Orthologs are genes from different species originated in a single common ancestral. Several projects infer orthology relationships among genes from sequenced genomes storing into orthologous databases. Evolutionary plasticity determination and evolutionary root inference are well- stablished analyses based on the distribution of orthologous groups in a species tree of reference. Those analyses are available in geneplast package from R Bioconductor repository. Despite being a consolidated package, geneplast requires great effort for preparing its input data. Here, we analyzed the structure of different orthologous databases and proposed a framework for automatizing data extraction and processing of orthology information in the format required by geneplast. As results, we produced four annotation packages in R language to provide input data for geneplast, enabling the expansion of its analyses for different sources of orthology information. Methods applied by this framework, as well as the data produced, are being consolidated in the package geneplast.data, which is in the process of submission for Bioconductor repository.