Analyzing Microsporidia sp. MB from sequence to biology: comprehensive exploration of the genome, protein structures, and functions through extensive bioinformatics analysis
- Authors: Ang'ang'o, Lilian Mbaisi
- Date: 2024-10-11
- Subjects: Microsporidia , Whole genome sequencing , Proteins Structure , Symbiont , Malaria Prevention , Vector control
- Language: English
- Type: Academic theses , Doctoral theses , text
- Identifier: http://hdl.handle.net/10962/466480 , vital:76734 , DOI https://doi.org/10.21504/10962/466480
- Description: Microsporidia are spore-forming intracellular organisms classified as the earliest divergent group within the Fungi kingdom. Microsporidia have been found widely affecting different hosts, including both vertebrates and invertebrates. The pathogenicity of microsporidia depends on their species and the host species they infect. Due to their obligate intracellular nature, microsporidia have extensively evolved. This is illustrated by their highly variable genome sizes and gene content. Being minimalist eukaryotes, the microsporidia genome is often associated with extreme gene reduction and compaction. However, these interesting microorganisms retain particular genes that help them acquire specific host nutrients, thereby relying heavily on their host for survival and proliferation. The mode of sexual reproduction of microsporidia has not been well-studied. Harnessing microsporidia in the laboratory is often a challenge, however, the advances in computational tools have made it cheaper and quicker to accurately predict and annotate these organisms to understand their mechanism of infection. Understanding the protein structure and function of these unique organisms is the baseline for providing insights into their biology and survival in their respective hosts. Microsporidia genomes contain a large proportion of hypothetical proteins of which their functions are not described. Vittaforma corneae ATCC 50505 was used as a model to highlight the functions and structure of these otherwise unknown proteins. A systematic annotation pipeline employing exhaustive computational tools was devised to carefully annotate the hypothetical proteins of V. corneae, aiming to characterize their structure and function.The genome of the novel microsporidian, Microsporidia sp. MB, a Plasmodium-transmission-blocking symbiont isolated from Anopheles mosquitoes in Sub-Saharan Africa, was sequenced, assembled, and annotated. The genome was found to contain over 2000 putative genes spanning its 5.9 Mb size and contained minimal repeats. Comparative phylogenomic analysis of Microsporidia sp. MB grouped this symbiont within the Enterocytozoonida (clade IV) microsporidia, clustering with its closest relative – V. corneae. Using robust computational techniques, prediction and characterization of the putative proteins of Microsporidia sp. MB was conducted. The decay of several proteins in the glycolytic pathway is one unique characteristic associated with microsporidia. The proteins retained or lost often vary across the microsporidian taxon. This study highlights the retention of most of the proteins involved in the glycolytic pathway in Microsporidia sp. MB. The available genome dataset of Microsporidia sp. MB was further used to infer its mode of sexual reproduction. The symbiont appears to have several meiotic-related gene orthologs, suggesting that it is capable of sexual reproduction. These findings describe the basic biology of Microsporidia sp. MB and provide a basis for future Next-Generation Sequencing, RNA sequencing experiments ultimately informing the application of this microorganism as a biological malaria control tool. , Thesis (PhD) -- Faculty of Science, Biochemistry, Microbiology and Bioinformatics, 2024
- Full Text:
- Date Issued: 2024-10-11
- Authors: Ang'ang'o, Lilian Mbaisi
- Date: 2024-10-11
- Subjects: Microsporidia , Whole genome sequencing , Proteins Structure , Symbiont , Malaria Prevention , Vector control
- Language: English
- Type: Academic theses , Doctoral theses , text
- Identifier: http://hdl.handle.net/10962/466480 , vital:76734 , DOI https://doi.org/10.21504/10962/466480
- Description: Microsporidia are spore-forming intracellular organisms classified as the earliest divergent group within the Fungi kingdom. Microsporidia have been found widely affecting different hosts, including both vertebrates and invertebrates. The pathogenicity of microsporidia depends on their species and the host species they infect. Due to their obligate intracellular nature, microsporidia have extensively evolved. This is illustrated by their highly variable genome sizes and gene content. Being minimalist eukaryotes, the microsporidia genome is often associated with extreme gene reduction and compaction. However, these interesting microorganisms retain particular genes that help them acquire specific host nutrients, thereby relying heavily on their host for survival and proliferation. The mode of sexual reproduction of microsporidia has not been well-studied. Harnessing microsporidia in the laboratory is often a challenge, however, the advances in computational tools have made it cheaper and quicker to accurately predict and annotate these organisms to understand their mechanism of infection. Understanding the protein structure and function of these unique organisms is the baseline for providing insights into their biology and survival in their respective hosts. Microsporidia genomes contain a large proportion of hypothetical proteins of which their functions are not described. Vittaforma corneae ATCC 50505 was used as a model to highlight the functions and structure of these otherwise unknown proteins. A systematic annotation pipeline employing exhaustive computational tools was devised to carefully annotate the hypothetical proteins of V. corneae, aiming to characterize their structure and function.The genome of the novel microsporidian, Microsporidia sp. MB, a Plasmodium-transmission-blocking symbiont isolated from Anopheles mosquitoes in Sub-Saharan Africa, was sequenced, assembled, and annotated. The genome was found to contain over 2000 putative genes spanning its 5.9 Mb size and contained minimal repeats. Comparative phylogenomic analysis of Microsporidia sp. MB grouped this symbiont within the Enterocytozoonida (clade IV) microsporidia, clustering with its closest relative – V. corneae. Using robust computational techniques, prediction and characterization of the putative proteins of Microsporidia sp. MB was conducted. The decay of several proteins in the glycolytic pathway is one unique characteristic associated with microsporidia. The proteins retained or lost often vary across the microsporidian taxon. This study highlights the retention of most of the proteins involved in the glycolytic pathway in Microsporidia sp. MB. The available genome dataset of Microsporidia sp. MB was further used to infer its mode of sexual reproduction. The symbiont appears to have several meiotic-related gene orthologs, suggesting that it is capable of sexual reproduction. These findings describe the basic biology of Microsporidia sp. MB and provide a basis for future Next-Generation Sequencing, RNA sequencing experiments ultimately informing the application of this microorganism as a biological malaria control tool. , Thesis (PhD) -- Faculty of Science, Biochemistry, Microbiology and Bioinformatics, 2024
- Full Text:
- Date Issued: 2024-10-11
Bioinformatics tool and web server development focusing on structural bioinformatics applications
- Authors: Nabatanzi, Margaret
- Date: 2022-10-14
- Subjects: Structural bioinformatics , Proteins Structure , Protein structure prediction , Proteins Conformation , Protein complex
- Language: English
- Type: Academic theses , Doctoral theses , text
- Identifier: http://hdl.handle.net/10962/365700 , vital:65777 , DOI https://doi.org/10.21504/10962/365700
- Description: This thesis is divided into two main sections: Part 1 describes the design, and evaluation of the accuracy of a new web server – PRotein Interactive MOdeling (PRIMO-Complexes) for modeling protein complexes and biological assemblies. The second part describes the development of bioinformatics tools to predict HIV-1 drug resistance and support bioinformatics research and education. Recent technological advances have resulted in a tremendous increase in the number of sequences and protein structures deposited in the Universal Protein Resource Knowledgebase (UniProtKB) and the Protein Data Bank (PDB). However, the number of sequences has increased at a higher rate compared with the experimentally solved multimeric protein structures. This is partly due to advances in high-throughput sequencing technology. To fill this protein sequence-structure gap, computational approaches have been developed to predict protein structures from available sequences. Computational approaches include template-based and ab initio modeling with the former being the most reliable. Template-based modeling process can be achieved using either standalone software or automated modeling web servers. However, using standalone software requires familiarity with command-line interfaces as well as utilising other intermediate programs which could be daunting to novice users. To alleviate some of these problems, the modeling process has been automated, however, it still has numerous challenges. To date, only a few web servers that support multimeric protein modeling have been developed and even these provide little, if any user involvement in the process. To address some of these issues, a new web server – PRIMO-Complexes – was developed to model protein complexes and biological assemblies. The existing PRIMO web server could only model monomeric proteins. Part 1 of this thesis provides a detailed account of the development and evaluation of PRIMO-Complexes. The rationale for developing this new web server was based on the understanding that most proteins function as protein multimers and often the ligand-binding sites, and enzyme active sites are located at the protein-protein interfaces. It, therefore, necessitated developing capabilities for modeling multimeric proteins. PRIMO-Complexes web server was developed using the Waterfall system development life cycle model, is based on the Django web framework and makes use of high-performance computing resources to execute jobs. The accuracy of the algorithms embedded in PRIMO- Complexes was evaluated and the results were promising. Additionally, PRIMO-Complexes performs comparatively well in relation to other web servers that offer multimeric protein modeling. Another unique feature of PRIMO-Complexes is its interactivity. The webserver was developed with capabilities for allowing users to model multimeric proteins with an appreciable degree of control over the process. In the second part of the thesis several other bioinformatics tools are described, for example, a webserver for predicting HIV-1 drug resistance, the RUBi protein model repository, and a bioinformatics web portal for education and research resources. RUBi protein model repository stores verified theoretical models built using various modeling approaches. This enables users to easily access models to reproduce and/or further the research. This is described in chapter 5. Chapter 6 describes the design and development of the Human Immunodeficiency type 1 Resistance Predictor (HIV-1 ResPredictor), a web application that employs artificial neural networks (ANN) to predict drug resistance in patients infected with HIV-1 subtype B. The ANNs and subtype classifiers performed well making this web application potentially useful to both clinicians and researchers in this era of personalised medicine. Finally, chapter 7 describes a bioinformatics education web portal that equips students with information on how to use bioinformatics online resources. Being aware of these resources is not enough without a deeper understanding and guidance on how to apply bioinformatics methods to solve practical problems. This web portal was aimed at familiarising students with the basic terminology and approaches in structural bioinformatics. Students will potentially gain skills to conduct real-life bioinformatics research to obtain biological insights. , Thesis (PhD) -- Faculty of Science, Biochemistry and Microbiology, 2022
- Full Text:
- Date Issued: 2022-10-14
- Authors: Nabatanzi, Margaret
- Date: 2022-10-14
- Subjects: Structural bioinformatics , Proteins Structure , Protein structure prediction , Proteins Conformation , Protein complex
- Language: English
- Type: Academic theses , Doctoral theses , text
- Identifier: http://hdl.handle.net/10962/365700 , vital:65777 , DOI https://doi.org/10.21504/10962/365700
- Description: This thesis is divided into two main sections: Part 1 describes the design, and evaluation of the accuracy of a new web server – PRotein Interactive MOdeling (PRIMO-Complexes) for modeling protein complexes and biological assemblies. The second part describes the development of bioinformatics tools to predict HIV-1 drug resistance and support bioinformatics research and education. Recent technological advances have resulted in a tremendous increase in the number of sequences and protein structures deposited in the Universal Protein Resource Knowledgebase (UniProtKB) and the Protein Data Bank (PDB). However, the number of sequences has increased at a higher rate compared with the experimentally solved multimeric protein structures. This is partly due to advances in high-throughput sequencing technology. To fill this protein sequence-structure gap, computational approaches have been developed to predict protein structures from available sequences. Computational approaches include template-based and ab initio modeling with the former being the most reliable. Template-based modeling process can be achieved using either standalone software or automated modeling web servers. However, using standalone software requires familiarity with command-line interfaces as well as utilising other intermediate programs which could be daunting to novice users. To alleviate some of these problems, the modeling process has been automated, however, it still has numerous challenges. To date, only a few web servers that support multimeric protein modeling have been developed and even these provide little, if any user involvement in the process. To address some of these issues, a new web server – PRIMO-Complexes – was developed to model protein complexes and biological assemblies. The existing PRIMO web server could only model monomeric proteins. Part 1 of this thesis provides a detailed account of the development and evaluation of PRIMO-Complexes. The rationale for developing this new web server was based on the understanding that most proteins function as protein multimers and often the ligand-binding sites, and enzyme active sites are located at the protein-protein interfaces. It, therefore, necessitated developing capabilities for modeling multimeric proteins. PRIMO-Complexes web server was developed using the Waterfall system development life cycle model, is based on the Django web framework and makes use of high-performance computing resources to execute jobs. The accuracy of the algorithms embedded in PRIMO- Complexes was evaluated and the results were promising. Additionally, PRIMO-Complexes performs comparatively well in relation to other web servers that offer multimeric protein modeling. Another unique feature of PRIMO-Complexes is its interactivity. The webserver was developed with capabilities for allowing users to model multimeric proteins with an appreciable degree of control over the process. In the second part of the thesis several other bioinformatics tools are described, for example, a webserver for predicting HIV-1 drug resistance, the RUBi protein model repository, and a bioinformatics web portal for education and research resources. RUBi protein model repository stores verified theoretical models built using various modeling approaches. This enables users to easily access models to reproduce and/or further the research. This is described in chapter 5. Chapter 6 describes the design and development of the Human Immunodeficiency type 1 Resistance Predictor (HIV-1 ResPredictor), a web application that employs artificial neural networks (ANN) to predict drug resistance in patients infected with HIV-1 subtype B. The ANNs and subtype classifiers performed well making this web application potentially useful to both clinicians and researchers in this era of personalised medicine. Finally, chapter 7 describes a bioinformatics education web portal that equips students with information on how to use bioinformatics online resources. Being aware of these resources is not enough without a deeper understanding and guidance on how to apply bioinformatics methods to solve practical problems. This web portal was aimed at familiarising students with the basic terminology and approaches in structural bioinformatics. Students will potentially gain skills to conduct real-life bioinformatics research to obtain biological insights. , Thesis (PhD) -- Faculty of Science, Biochemistry and Microbiology, 2022
- Full Text:
- Date Issued: 2022-10-14
- «
- ‹
- 1
- ›
- »