Sequence, structure, dynamics, and substrate specificity analyses of bacterial Glycoside Hydrolase 1 enzymes from several activities
- Authors: Veldman, Wayde Michael
- Date: 2022-04-08
- Subjects: Glycosidases , Bioinformatics , Molecular dynamics , Ligands (Biochemistry) , Enzymes , Ligand binding (Biochemistry) , Sequence alignment (Bioinformatics) , Structural bioinformatics
- Language: English
- Type: Doctoral thesis , text
- Identifier: http://hdl.handle.net/10962/233805 , vital:50129 , DOI 10.21504/10962/233810
- Description: Glycoside hydrolase 1 (GH1) enzymes are a ubiquitous family of enzymes that hydrolyse the glycosidic bond between two or more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. Despite their conserved catalytic domain, these enzymes have many different enzyme activities and/or substrate specificities as a change of only a few residues in the active site can alter their function. Most GH1 active site residues are situated in loop regions, and it is known that enzymes are more likely to develop new functions (broad specificity) if they possess an active site with a high proportion of loops. Furthermore, the GH1 active site consists of several subsites and cooperative binding makes the binding affinity of sites difficult to measure because the properties of one subsite are influenced by the binding of the other subsites. Extensive knowledge of protein-ligand interactions is critical to the comprehension of biology at the molecular level. However, the structural determinants and molecular details of GH1 ligand specificity and affinity are very broad, highly complex, not well understood, and therefore still need to be clarified. The aim of this study was to computationally characterise the activity of three newly solved GH1 crystallographic structures sent to us by our collaborators, and to provide evidence for their ligand-binding specificities. In addition, the differences in structural and biochemical contributions to enzyme specificity and/or function between different GH1 activities/enzymes was assessed, and the sequence/structure/function relationship of several activities of GH1 enzymes was analysed and compared. To accomplish the research aims, sequence analyses involving sequence identity, phylogenetics, and motif discovery were performed. As protein structure is more conserved than sequence, the discovered motifs were mapped to 3D structures for structural analysis and comparisons. To obtain information on enzyme mechanism or mode of action, as well as structure-function relationship, computational methods such as docking, molecular dynamics, binding free energy calculations, and essential dynamics were implemented. These computational approaches can provide information on the active site, binding residues, protein-ligand interactions, binding affinity, conformational change, and most structural or dynamic elements that play a role in enzyme function. The three new structures received from our collaborators are the first GH1 crystallographic structures from Bacillus licheniformis ever determined. As phospho-glycoside compounds were unavailable for purchase for use in activity assays, and as the active sites of the structures were absent of ligand, in silico docking and MD simulations were performed to provide evidence for their GH1 activities and substrate specificities. First though, the amino acid sequences of all known characterised bacterial GH1 enzymes were retrieved from the CAZy database and compared to the sequences of the three new B. licheniformis crystallographic structures which provided evidence of the putative 6Pβ-glucosidase activity of enzyme BlBglH, and dual 6Pβ-glucosidase/6Pβ-galactosidase (dual-phospho) activity of enzymes BlBglB and BlBglC. As all three enzymes were determined to be putative 6Pβ-glycosidase activity enzymes, much of the thesis focused on the overall analysis and comparison of the 6Pβ-glucosidase, 6Pβ-galactosidase, and dual-phospho activities that make up the 6Pβ-glycosidases. The 6Pβ-glycosidase active site residues were identified through consensus of binding interactions using all known 6Pβ-glycosidase PDB structures complexed complete ligand substrates. With regards to the 6Pβ-glucosidase activity, it was found that the L8b loop is longer and forms extra interactions with the L8a loop likely leading to increased L8 loop rigidity which would prevent the displacement of residue Ala423 ensuring a steric clash with galactoconfigured ligands and may engender substrate specificity for gluco-configured ligands only. Also, during molecular dynamics simulations using enzyme BlBglH (6Pβ-glucosidase activity), it was revealed that the favourable binding of substrate stabilises the loops that surround and make up the enzyme active site. Using the BlBglC (dual-phospho activity) enzyme structure with either galacto- (PNP6Pgal) or gluco-configured (PNP6Pglc) ligands, MD simulations in triplicate revealed important details of the broad specificity of dual-phospho activity enzymes. The ligand O4 hydroxyl position is the only difference between PNP6Pgal and PNP6Pgal, and it was found that residues Gln23 and Trp433 bind strongly to the ligand O3 hydroxyl group in the PNP6Pgal-enzyme complex, but to the ligand O4 hydroxyl group in the PNP6Pglc-enzyme complex. Also, His124 formed many hydrogen bonds with the PNP6Pgal O3 hydroxyl group but had none with PNP6Pglc. Alternatively, residues Tyr173, Tyr301, Gln302 and Thr321 formed hydrogen bonds with PNP6Pglc but not PNP6Pgal. Lastly, using multiple 3D structures from various GH1 activities, a large network of conserved interactions between active site residues (and other important residues) was uncovered, which most likely stabilise the loop regions that contain these residues, helping to retain their positions needed for binding molecules. Alternatively, there exists several differing residue-residue interactions when comparing each of the activities which could contribute towards individual activity substrate specificity by causing slightly different overall structure and malleability of the active site. Altogether, the findings in this thesis shed light on the function, mechanisms, dynamics, and ligand-binding of GH1 enzymes – particularly of the 6Pβ-glycosidase activities. , Thesis (PhD) -- Faculty of Science, Biochemistry and Microbiology, 2022
- Full Text:
- Date Issued: 2022-04-08
- Authors: Veldman, Wayde Michael
- Date: 2022-04-08
- Subjects: Glycosidases , Bioinformatics , Molecular dynamics , Ligands (Biochemistry) , Enzymes , Ligand binding (Biochemistry) , Sequence alignment (Bioinformatics) , Structural bioinformatics
- Language: English
- Type: Doctoral thesis , text
- Identifier: http://hdl.handle.net/10962/233805 , vital:50129 , DOI 10.21504/10962/233810
- Description: Glycoside hydrolase 1 (GH1) enzymes are a ubiquitous family of enzymes that hydrolyse the glycosidic bond between two or more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. Despite their conserved catalytic domain, these enzymes have many different enzyme activities and/or substrate specificities as a change of only a few residues in the active site can alter their function. Most GH1 active site residues are situated in loop regions, and it is known that enzymes are more likely to develop new functions (broad specificity) if they possess an active site with a high proportion of loops. Furthermore, the GH1 active site consists of several subsites and cooperative binding makes the binding affinity of sites difficult to measure because the properties of one subsite are influenced by the binding of the other subsites. Extensive knowledge of protein-ligand interactions is critical to the comprehension of biology at the molecular level. However, the structural determinants and molecular details of GH1 ligand specificity and affinity are very broad, highly complex, not well understood, and therefore still need to be clarified. The aim of this study was to computationally characterise the activity of three newly solved GH1 crystallographic structures sent to us by our collaborators, and to provide evidence for their ligand-binding specificities. In addition, the differences in structural and biochemical contributions to enzyme specificity and/or function between different GH1 activities/enzymes was assessed, and the sequence/structure/function relationship of several activities of GH1 enzymes was analysed and compared. To accomplish the research aims, sequence analyses involving sequence identity, phylogenetics, and motif discovery were performed. As protein structure is more conserved than sequence, the discovered motifs were mapped to 3D structures for structural analysis and comparisons. To obtain information on enzyme mechanism or mode of action, as well as structure-function relationship, computational methods such as docking, molecular dynamics, binding free energy calculations, and essential dynamics were implemented. These computational approaches can provide information on the active site, binding residues, protein-ligand interactions, binding affinity, conformational change, and most structural or dynamic elements that play a role in enzyme function. The three new structures received from our collaborators are the first GH1 crystallographic structures from Bacillus licheniformis ever determined. As phospho-glycoside compounds were unavailable for purchase for use in activity assays, and as the active sites of the structures were absent of ligand, in silico docking and MD simulations were performed to provide evidence for their GH1 activities and substrate specificities. First though, the amino acid sequences of all known characterised bacterial GH1 enzymes were retrieved from the CAZy database and compared to the sequences of the three new B. licheniformis crystallographic structures which provided evidence of the putative 6Pβ-glucosidase activity of enzyme BlBglH, and dual 6Pβ-glucosidase/6Pβ-galactosidase (dual-phospho) activity of enzymes BlBglB and BlBglC. As all three enzymes were determined to be putative 6Pβ-glycosidase activity enzymes, much of the thesis focused on the overall analysis and comparison of the 6Pβ-glucosidase, 6Pβ-galactosidase, and dual-phospho activities that make up the 6Pβ-glycosidases. The 6Pβ-glycosidase active site residues were identified through consensus of binding interactions using all known 6Pβ-glycosidase PDB structures complexed complete ligand substrates. With regards to the 6Pβ-glucosidase activity, it was found that the L8b loop is longer and forms extra interactions with the L8a loop likely leading to increased L8 loop rigidity which would prevent the displacement of residue Ala423 ensuring a steric clash with galactoconfigured ligands and may engender substrate specificity for gluco-configured ligands only. Also, during molecular dynamics simulations using enzyme BlBglH (6Pβ-glucosidase activity), it was revealed that the favourable binding of substrate stabilises the loops that surround and make up the enzyme active site. Using the BlBglC (dual-phospho activity) enzyme structure with either galacto- (PNP6Pgal) or gluco-configured (PNP6Pglc) ligands, MD simulations in triplicate revealed important details of the broad specificity of dual-phospho activity enzymes. The ligand O4 hydroxyl position is the only difference between PNP6Pgal and PNP6Pgal, and it was found that residues Gln23 and Trp433 bind strongly to the ligand O3 hydroxyl group in the PNP6Pgal-enzyme complex, but to the ligand O4 hydroxyl group in the PNP6Pglc-enzyme complex. Also, His124 formed many hydrogen bonds with the PNP6Pgal O3 hydroxyl group but had none with PNP6Pglc. Alternatively, residues Tyr173, Tyr301, Gln302 and Thr321 formed hydrogen bonds with PNP6Pglc but not PNP6Pgal. Lastly, using multiple 3D structures from various GH1 activities, a large network of conserved interactions between active site residues (and other important residues) was uncovered, which most likely stabilise the loop regions that contain these residues, helping to retain their positions needed for binding molecules. Alternatively, there exists several differing residue-residue interactions when comparing each of the activities which could contribute towards individual activity substrate specificity by causing slightly different overall structure and malleability of the active site. Altogether, the findings in this thesis shed light on the function, mechanisms, dynamics, and ligand-binding of GH1 enzymes – particularly of the 6Pβ-glycosidase activities. , Thesis (PhD) -- Faculty of Science, Biochemistry and Microbiology, 2022
- Full Text:
- Date Issued: 2022-04-08
Computer aided approaches against Human African Trypanosomiasis
- Authors: Kimuda, Magambo Phillip
- Date: 2020
- Subjects: African trypanosomiasis , African trypanosomiasis -- Chemotherapy , Genomics , Macrophage migration inhibitory factor , Trypanosoma brucei , Pteridines , Tetrahydrofolate dehydrogenase , Adenylic acid , Molecular dynamics , Principal components analysis , Bioinformatics , Single nucleotide polymorphisms , Single Nucleotide Variants , Candidate Gene Association Study (CGAS)
- Language: English
- Type: Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/142542 , vital:38089
- Description: The thesis presented here is divided into two parts under a common theme that is the use of computer based tools, genomics, and in vitro experiments to develop innovative ways of tackling Human African Trypanosomiasis (HAT). Part I of this thesis focused on the human host genetic determinants while Part II focused on the discovery of novel chemotherapeutics against the parasite. Part I is further sub-divided into two parts: The first involves a Candidate Gene Association Study (CGAS) on an African population to identify genetic determinants associated with disease and/or susceptibility to HAT. The second involves studying the effects of missense Single Nucleotide Variants (SNVs) on protein structure, dynamics, and function using Macrophage Migration Inhibitory Factor (MIF) as a case study. Part II is also sub-divided into two parts: The first involves a computer based rational drug discovery of potential inhibitors against the Trypanosoma the folate pathway; particularly by targeting Trypanosoma brucei Pteridine Reductase (TbPTR1) which is an enzyme used by trypanosomes to overcome T. brucei Dihydrofolate Reductase (TbDHFR) inhibition. Lastly the derivation of CHARMM force-field parameters that can be used to accurately model the geometry and dynamics of the T. brucei Phosphodiesterase B1 enzyme (TbrPDEB1) bimetallic active site center. The derived parameters were then used in MD simulations to characterise protein-ligand residue interactions that are important in TbrPDEB1 inhibition with the goal of targeting the cyclic Adenosine Monophosphate (cAMP) signalling pathway. In the CGAS we were unable to detect any genetic associations in the Ugandan cohort analysed that passed correction for multiple testing in spite of the study being sufficiently powered. Additionally, our study found no association of the Apo lipoprotein 1 (APOL1) G2 allele association with protection against acute HAT that has been previously reported. Future investigations for example, Genome Wide Association Studies using larger samples sizes (>3000 cases and controls) are required. Macrophage migration inhibitory factor (MIF) is a cytokine that is important in both innate and adaptive immunity that has been shown to play a role in T. brucei pathogenicity using murine models. A total of 27 missense SNVs were modelled using homology modelling to create MIF protein mutants that were investigated using in silico effect prediction tools, molecular dynamics (MD), Principal Component Analysis (PCA), and Dynamic Residue Network (DRN) analysis. Our results demonstrate that mutations P2Q, I5M, P16Q, L23F, T24S, T31I, Y37H, H41P, M48V, P44L, G52C, S54R, I65M, I68T, S75F, N106S, and T113S caused significant conformational changes. Further, DRN analysis showed that residues P2, T31, Y37, G52, I65, I68, S75, N106, and T113S are part of a similar local residue interaction network with functional significance. These results show how polymorphisms such as missense SNVs can affect protein conformation, dynamics, and function. Trypanosomes are auxotrophic for folates and pterins but require them for survival. They scavenge them from their hosts. PTR1 is a multifunctional enzyme that is unique to trypanosomatids that reduces both pterins and folates. In the presence of DHFR inhibitors, PTR1 is over-expressed thus providing an escape from the effects of DHFR inhibition. Both TbPTR1 and TbDHFR are pharmacologically and genetically validated drug targets. In this study 5742 compounds were screened using molecular docking, and 13 promising binding modes were further analysed using MD simulations. The trajectories were analysed using RMSD, Rg, RMSF, PCA, Essential Dynamics Analysis (EDA), Molecular Mechanics Poisson–Boltzmann surface area (MM-PBSA) binding free energy calculations, and DRN analysis. The computational screening approach allowed us to identify five of the compounds, named RUBi004, RUBi007, RUBi014, RUBi016 and RUBi018 that exhibited antitrypanosomal growth activities against trypanosomes in culture with IC50 values of 12.5 ± 4.8 μM, 32.4 ± 4.2 μM, 5.9 ± 1.4 μM, 28.2 ± 3.3 μM, and 9.7 ± 2.1 μM, respectively. Further when used in combination with WR99210 a known TbDHFR inhibitor RUBi004, RUBi007, RUBi014 and RUBi018 showed antagonism while RUBi016 showed an additive effect. These results indicate that the four compounds might be competing with TbDHFR while RUBi016 might be more specific for TbPTR1. These compounds provide scaffolds that can be further optimised to improve their potency and specificity. Lastly, using a systematic approach we derived CHARMM force-field parameters to accurately describe the TbrPDEB1 bi-metal catalytic center. For dynamics, we employed mixed bonded and non-bonded approach. We optimised the structure using a two-layer QM/MM ONIOM (B3LYP/6-31(g): UFF). The TbrPDEB1 bi-metallic center bonds, angles, and dihedrals were parameterized by fitting the energy profiles from Potential Energy Surface (PES) scans to the CHARMM potential energy function. The parameters were validated by means of MD simulations and analysed using RMSD, Rg, RMSF, hydrogen bonding, bond/angle/dihedral evaluations, EDA, PCA, and DRN analysis. The force-field parameters were able to accurately reproduce the geometry and dynamics of the TbrPDEB1 bi-metal catalytic center during MD simulations. Molecular docking was used to identify 6 potential hits, that inhibited trypanosome growth in vitro. The derived force-field parameters were used to simulate the 6 protein-ligand complexes with the aim of elucidating crucial protein-ligand residue interactions. Using the most potent ligand RUBi022 that had an IC50 of 14.96 μM we were able to identify key residue interactions that can be of use in in silico prediction of potential TbrPDEB1 inhibitors. Overall we demonstrate how bioinformatics tools can complement current disease eradication strategies. Future work will focus on identifying variants identified in Genome Wide Association Studies and partnering with wet labs to carry out further enzyme-ligand activity relationship studies, structure determination or characterisation of appropriate protein-ligand complexes by crystallography, and site specific mutation studies
- Full Text:
- Date Issued: 2020
- Authors: Kimuda, Magambo Phillip
- Date: 2020
- Subjects: African trypanosomiasis , African trypanosomiasis -- Chemotherapy , Genomics , Macrophage migration inhibitory factor , Trypanosoma brucei , Pteridines , Tetrahydrofolate dehydrogenase , Adenylic acid , Molecular dynamics , Principal components analysis , Bioinformatics , Single nucleotide polymorphisms , Single Nucleotide Variants , Candidate Gene Association Study (CGAS)
- Language: English
- Type: Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/142542 , vital:38089
- Description: The thesis presented here is divided into two parts under a common theme that is the use of computer based tools, genomics, and in vitro experiments to develop innovative ways of tackling Human African Trypanosomiasis (HAT). Part I of this thesis focused on the human host genetic determinants while Part II focused on the discovery of novel chemotherapeutics against the parasite. Part I is further sub-divided into two parts: The first involves a Candidate Gene Association Study (CGAS) on an African population to identify genetic determinants associated with disease and/or susceptibility to HAT. The second involves studying the effects of missense Single Nucleotide Variants (SNVs) on protein structure, dynamics, and function using Macrophage Migration Inhibitory Factor (MIF) as a case study. Part II is also sub-divided into two parts: The first involves a computer based rational drug discovery of potential inhibitors against the Trypanosoma the folate pathway; particularly by targeting Trypanosoma brucei Pteridine Reductase (TbPTR1) which is an enzyme used by trypanosomes to overcome T. brucei Dihydrofolate Reductase (TbDHFR) inhibition. Lastly the derivation of CHARMM force-field parameters that can be used to accurately model the geometry and dynamics of the T. brucei Phosphodiesterase B1 enzyme (TbrPDEB1) bimetallic active site center. The derived parameters were then used in MD simulations to characterise protein-ligand residue interactions that are important in TbrPDEB1 inhibition with the goal of targeting the cyclic Adenosine Monophosphate (cAMP) signalling pathway. In the CGAS we were unable to detect any genetic associations in the Ugandan cohort analysed that passed correction for multiple testing in spite of the study being sufficiently powered. Additionally, our study found no association of the Apo lipoprotein 1 (APOL1) G2 allele association with protection against acute HAT that has been previously reported. Future investigations for example, Genome Wide Association Studies using larger samples sizes (>3000 cases and controls) are required. Macrophage migration inhibitory factor (MIF) is a cytokine that is important in both innate and adaptive immunity that has been shown to play a role in T. brucei pathogenicity using murine models. A total of 27 missense SNVs were modelled using homology modelling to create MIF protein mutants that were investigated using in silico effect prediction tools, molecular dynamics (MD), Principal Component Analysis (PCA), and Dynamic Residue Network (DRN) analysis. Our results demonstrate that mutations P2Q, I5M, P16Q, L23F, T24S, T31I, Y37H, H41P, M48V, P44L, G52C, S54R, I65M, I68T, S75F, N106S, and T113S caused significant conformational changes. Further, DRN analysis showed that residues P2, T31, Y37, G52, I65, I68, S75, N106, and T113S are part of a similar local residue interaction network with functional significance. These results show how polymorphisms such as missense SNVs can affect protein conformation, dynamics, and function. Trypanosomes are auxotrophic for folates and pterins but require them for survival. They scavenge them from their hosts. PTR1 is a multifunctional enzyme that is unique to trypanosomatids that reduces both pterins and folates. In the presence of DHFR inhibitors, PTR1 is over-expressed thus providing an escape from the effects of DHFR inhibition. Both TbPTR1 and TbDHFR are pharmacologically and genetically validated drug targets. In this study 5742 compounds were screened using molecular docking, and 13 promising binding modes were further analysed using MD simulations. The trajectories were analysed using RMSD, Rg, RMSF, PCA, Essential Dynamics Analysis (EDA), Molecular Mechanics Poisson–Boltzmann surface area (MM-PBSA) binding free energy calculations, and DRN analysis. The computational screening approach allowed us to identify five of the compounds, named RUBi004, RUBi007, RUBi014, RUBi016 and RUBi018 that exhibited antitrypanosomal growth activities against trypanosomes in culture with IC50 values of 12.5 ± 4.8 μM, 32.4 ± 4.2 μM, 5.9 ± 1.4 μM, 28.2 ± 3.3 μM, and 9.7 ± 2.1 μM, respectively. Further when used in combination with WR99210 a known TbDHFR inhibitor RUBi004, RUBi007, RUBi014 and RUBi018 showed antagonism while RUBi016 showed an additive effect. These results indicate that the four compounds might be competing with TbDHFR while RUBi016 might be more specific for TbPTR1. These compounds provide scaffolds that can be further optimised to improve their potency and specificity. Lastly, using a systematic approach we derived CHARMM force-field parameters to accurately describe the TbrPDEB1 bi-metal catalytic center. For dynamics, we employed mixed bonded and non-bonded approach. We optimised the structure using a two-layer QM/MM ONIOM (B3LYP/6-31(g): UFF). The TbrPDEB1 bi-metallic center bonds, angles, and dihedrals were parameterized by fitting the energy profiles from Potential Energy Surface (PES) scans to the CHARMM potential energy function. The parameters were validated by means of MD simulations and analysed using RMSD, Rg, RMSF, hydrogen bonding, bond/angle/dihedral evaluations, EDA, PCA, and DRN analysis. The force-field parameters were able to accurately reproduce the geometry and dynamics of the TbrPDEB1 bi-metal catalytic center during MD simulations. Molecular docking was used to identify 6 potential hits, that inhibited trypanosome growth in vitro. The derived force-field parameters were used to simulate the 6 protein-ligand complexes with the aim of elucidating crucial protein-ligand residue interactions. Using the most potent ligand RUBi022 that had an IC50 of 14.96 μM we were able to identify key residue interactions that can be of use in in silico prediction of potential TbrPDEB1 inhibitors. Overall we demonstrate how bioinformatics tools can complement current disease eradication strategies. Future work will focus on identifying variants identified in Genome Wide Association Studies and partnering with wet labs to carry out further enzyme-ligand activity relationship studies, structure determination or characterisation of appropriate protein-ligand complexes by crystallography, and site specific mutation studies
- Full Text:
- Date Issued: 2020
Bioinformatics tool development with a focus on structural bioinformatics and the analysis of genetic variation in humans
- Authors: Brown, David K
- Date: 2018
- Subjects: Bioinformatics , Human genetics -- Variation , High performance computing , Workflow management systems , Molecular dynamics , Next generation sequencing , Human Mutation Analysis (HUMA)
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/60708 , vital:27820
- Description: This thesis is divided into three parts, united under the general theme of bioinformatics tool development and variation analysis. Part 1 describes the design and development of the Job Management System (JMS), a workflow management system for high performance computing (HPC). HPC has become an integral part of bioinformatics. Computational methods for molecular dynamics and next generation sequencing (NGS) analysis, which require complex calculations on large datasets, are not yet feasible on desktop computers. As such, powerful computer clusters have been employed to perform these calculations. However, making use of these HPC clusters requires familiarity with command line interfaces. This excludes a large number of researchers from taking advantage of these resources. JMS was developed as a tool to make it easier for researchers without a computer science background to make use of HPC. Additionally, JMS can be used to host computational tools and pipelines and generates both web-based interfaces and RESTful APIs for those tools. The web-based interfaces can be used to quickly and easily submit jobs to the underlying cluster. The RESTful web API, on the other hand, allows JMS to provided backend functionality for external tools and web servers that want to run jobs on the cluster. Numerous tools and workflows have already been added to JMS, several of which have been incorporated into external web servers. One such web server is the Human Mutation Analysis (HUMA) web server and database. HUMA, the topic of part 2 of this thesis, is a platform for the analysis of genetic variation in humans. HUMA aggregates data from various existing databases into a single, connected and related database. The advantages of this are realized in the powerful querying abilities that it provides. HUMA includes protein, gene, disease, and variation data and can be searched from the angle of any one of these categories. For example, searching for a protein will return the protein data (e.g. protein sequences, structures, domains and families, and other meta-data). However, the related nature of the database means that genes, diseases, variation, and literature related to the protein will also be returned, giving users a powerful and holistic view of all data associated with the protein. HUMA also provides links to the original sources of the data, allowing users to follow the links to find additional details. HUMA aims to be a platform for the analysis of genetic variation. As such, it also provides tools to visualize and analyse the data (several of which run on the underlying cluster, via JMS). These tools include alignment and 3D structure visualization, homology modeling, variant analysis, and the ability to upload custom variation datasets and map them to proteins, genes and diseases. HUMA also provides collaboration features, allowing users to share and discuss datasets and job results. Finally, part 3 of this thesis focused on the development of a suite of tools, MD-TASK, to analyse genetic variation at the protein structure level via network analysis of molecular dynamics simulations. The use of MD-TASK in combination with the tools developed in the previous parts of this thesis is showcased via the analysis of variation in the renin-angiotensinogen complex, a vital part of the renin-angiotensin system.
- Full Text:
- Date Issued: 2018
- Authors: Brown, David K
- Date: 2018
- Subjects: Bioinformatics , Human genetics -- Variation , High performance computing , Workflow management systems , Molecular dynamics , Next generation sequencing , Human Mutation Analysis (HUMA)
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/60708 , vital:27820
- Description: This thesis is divided into three parts, united under the general theme of bioinformatics tool development and variation analysis. Part 1 describes the design and development of the Job Management System (JMS), a workflow management system for high performance computing (HPC). HPC has become an integral part of bioinformatics. Computational methods for molecular dynamics and next generation sequencing (NGS) analysis, which require complex calculations on large datasets, are not yet feasible on desktop computers. As such, powerful computer clusters have been employed to perform these calculations. However, making use of these HPC clusters requires familiarity with command line interfaces. This excludes a large number of researchers from taking advantage of these resources. JMS was developed as a tool to make it easier for researchers without a computer science background to make use of HPC. Additionally, JMS can be used to host computational tools and pipelines and generates both web-based interfaces and RESTful APIs for those tools. The web-based interfaces can be used to quickly and easily submit jobs to the underlying cluster. The RESTful web API, on the other hand, allows JMS to provided backend functionality for external tools and web servers that want to run jobs on the cluster. Numerous tools and workflows have already been added to JMS, several of which have been incorporated into external web servers. One such web server is the Human Mutation Analysis (HUMA) web server and database. HUMA, the topic of part 2 of this thesis, is a platform for the analysis of genetic variation in humans. HUMA aggregates data from various existing databases into a single, connected and related database. The advantages of this are realized in the powerful querying abilities that it provides. HUMA includes protein, gene, disease, and variation data and can be searched from the angle of any one of these categories. For example, searching for a protein will return the protein data (e.g. protein sequences, structures, domains and families, and other meta-data). However, the related nature of the database means that genes, diseases, variation, and literature related to the protein will also be returned, giving users a powerful and holistic view of all data associated with the protein. HUMA also provides links to the original sources of the data, allowing users to follow the links to find additional details. HUMA aims to be a platform for the analysis of genetic variation. As such, it also provides tools to visualize and analyse the data (several of which run on the underlying cluster, via JMS). These tools include alignment and 3D structure visualization, homology modeling, variant analysis, and the ability to upload custom variation datasets and map them to proteins, genes and diseases. HUMA also provides collaboration features, allowing users to share and discuss datasets and job results. Finally, part 3 of this thesis focused on the development of a suite of tools, MD-TASK, to analyse genetic variation at the protein structure level via network analysis of molecular dynamics simulations. The use of MD-TASK in combination with the tools developed in the previous parts of this thesis is showcased via the analysis of variation in the renin-angiotensinogen complex, a vital part of the renin-angiotensin system.
- Full Text:
- Date Issued: 2018
The investigation of type-specific features of the copper coordinating AA9 proteins and their effect on the interaction with crystalline cellulose using molecular dynamics studies
- Authors: Moses, Vuyani
- Date: 2018
- Subjects: Copper proteins , Cellulose , Molecular dynamics , Cellulose -- Biodegradation , Bioinformatics
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/58327 , vital:27230
- Description: AA9 proteins are metallo-enzymes which are crucial for the early stages of cellulose degradation. AA9 proteins have been suggested to cleave glycosidic bonds linking cellulose through the use of their Cu2+ coordinating active site. AA9 proteins possess different regioselectivities depending on the resulting cleavage they form and as result, are grouped accordingly. Type 1 AA9 proteins cleave the C1 carbon of cellulose while Type 2 AA9 proteins cleave the C4 carbon and Type 3 AA9 proteins cleave either C1 or C4 carbons. The steric congestion of the AA9 active site has been proposed to be a contributor to the observed regioselectivity. As such, a bioinformatics characterisation of type-specific sequence and structural features was performed. Initially AA9 protein sequences were obtained from the Pfam database and multiple sequence alignment was performed. The sequences were phylogenetically characterised and sequences were grouped into their respective types and sub-groups were identified. A selection analysis was performed on AA9 LPMO types to determine the selective pressure acting on AA9 protein residues. Motif discovery was then performed to identify conserved sequence motifs in AA9 proteins. Once type-specific sequence features were identified structural mapping was performed to assess possible effects on substrate interaction. Physicochemical property analysis was also performed to assess biochemical differences between AA9 LPMO types. Molecular dynamics (MD) simulations were then employed to dynamically assess the consequences of the discovered type-specific features on AA9-cellulose interaction. Due to the absence of AA9 specific force field parameters MD simulations were not readily applicable. As a result, Potential Energy Surface (PES) scans were performed to evaluate the force field parameters for the AA9 active site using the PM6 semi empirical approach and least squares fitting. A Type 1 AA9 active site was constructed from the crystal structure 4B5Q, encompassing only the Cu2+ coordinating residues, the Cu2+ ion and two water residues. Due to the similarity in AA9 active sites, the Type force field parameters were validated on all three AA9 LPMO types. Two MD simulations for each AA9 LPMO types were conducted using two separate Lennard-Jones parameter sets. Once completed, the MD trajectories were analysed for various features including the RMSD, RMSF, radius of gyration, coordination during simulation, hydrogen bonding, secondary structure conservation and overall protein movement. Force field parameters were successfully evaluated and validated for AA9 proteins. MD simulations of AA9 proteins were able to reveal the presence of unique type-specific binding modes of AA9 active sites to cellulose. These binding modes were characterised by the presence of unique type-specific loops which were present in Type 2 and 3 AA9 proteins but not in Type 1 AA9 proteins. The loops were found to result in steric congestion that affects how the Cu2+ ion interacts with cellulose. As a result, Cu2+ binding to cellulose was observed for Type 1 and not Type 2 and 3 AA9 proteins. In this study force field parameters have been evaluated for the Type 1 active site of AA9 proteins and this parameters were evaluated on all three types and binding. Future work will focus on identifying the nature of the reactive oxygen species and performing QM/MM calculations to elucidate the reactive mechanism of all three AA9 LPMO types.
- Full Text:
- Date Issued: 2018
- Authors: Moses, Vuyani
- Date: 2018
- Subjects: Copper proteins , Cellulose , Molecular dynamics , Cellulose -- Biodegradation , Bioinformatics
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/58327 , vital:27230
- Description: AA9 proteins are metallo-enzymes which are crucial for the early stages of cellulose degradation. AA9 proteins have been suggested to cleave glycosidic bonds linking cellulose through the use of their Cu2+ coordinating active site. AA9 proteins possess different regioselectivities depending on the resulting cleavage they form and as result, are grouped accordingly. Type 1 AA9 proteins cleave the C1 carbon of cellulose while Type 2 AA9 proteins cleave the C4 carbon and Type 3 AA9 proteins cleave either C1 or C4 carbons. The steric congestion of the AA9 active site has been proposed to be a contributor to the observed regioselectivity. As such, a bioinformatics characterisation of type-specific sequence and structural features was performed. Initially AA9 protein sequences were obtained from the Pfam database and multiple sequence alignment was performed. The sequences were phylogenetically characterised and sequences were grouped into their respective types and sub-groups were identified. A selection analysis was performed on AA9 LPMO types to determine the selective pressure acting on AA9 protein residues. Motif discovery was then performed to identify conserved sequence motifs in AA9 proteins. Once type-specific sequence features were identified structural mapping was performed to assess possible effects on substrate interaction. Physicochemical property analysis was also performed to assess biochemical differences between AA9 LPMO types. Molecular dynamics (MD) simulations were then employed to dynamically assess the consequences of the discovered type-specific features on AA9-cellulose interaction. Due to the absence of AA9 specific force field parameters MD simulations were not readily applicable. As a result, Potential Energy Surface (PES) scans were performed to evaluate the force field parameters for the AA9 active site using the PM6 semi empirical approach and least squares fitting. A Type 1 AA9 active site was constructed from the crystal structure 4B5Q, encompassing only the Cu2+ coordinating residues, the Cu2+ ion and two water residues. Due to the similarity in AA9 active sites, the Type force field parameters were validated on all three AA9 LPMO types. Two MD simulations for each AA9 LPMO types were conducted using two separate Lennard-Jones parameter sets. Once completed, the MD trajectories were analysed for various features including the RMSD, RMSF, radius of gyration, coordination during simulation, hydrogen bonding, secondary structure conservation and overall protein movement. Force field parameters were successfully evaluated and validated for AA9 proteins. MD simulations of AA9 proteins were able to reveal the presence of unique type-specific binding modes of AA9 active sites to cellulose. These binding modes were characterised by the presence of unique type-specific loops which were present in Type 2 and 3 AA9 proteins but not in Type 1 AA9 proteins. The loops were found to result in steric congestion that affects how the Cu2+ ion interacts with cellulose. As a result, Cu2+ binding to cellulose was observed for Type 1 and not Type 2 and 3 AA9 proteins. In this study force field parameters have been evaluated for the Type 1 active site of AA9 proteins and this parameters were evaluated on all three types and binding. Future work will focus on identifying the nature of the reactive oxygen species and performing QM/MM calculations to elucidate the reactive mechanism of all three AA9 LPMO types.
- Full Text:
- Date Issued: 2018
- «
- ‹
- 1
- ›
- »