Analysis of predictive power of binding affinity of PBM-derived sequences
- Authors: Matereke, Lavious Tapiwa
- Date: 2015
- Subjects: Transcription factors , Protein binding , DNA-binding proteins , Chromatin , Protein microarrays
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4161 , http://hdl.handle.net/10962/d1018666
- Description: A transcription factor (TF) is a protein that binds to specific DNA sequences as part of the initiation stage of transcription. Various methods of finding these transcription factor binding sites (TFBS) have been developed. In vivo technologies analyze DNA binding regions known to have bound to a TF in a living cell. Most widely used in vivo methods at the moment are chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) and DNase I hypersensitive sites sequencing. In vitro methods derive TFBS based on experiments with TFs and DNA usually in artificial settings or computationally. An example is the Protein Binding Microarray which uses artificially constructed DNA sequences to determine the short sequences that are most likely to bind to a TF. The major drawback of this approach is that binding of TFs in vivo is also dependent on other factors such as chromatin accessibility and the presence of cofactors. Therefore TFBS derived from the PBM technique might not resemble the true DNA binding sequences. In this work, we use PBM data from the UniPROBE motif database, ChIP-seq data and DNase I hypersensitive sites data. Using the Spearman’s rank correlation and area under receiver operating characteristic curve, we compare the enrichment scores which the PBM approach assigns to its identified sequences and the frequency of these sequences in likely binding regions and the human genome as a whole. We also use central motif enrichment analysis (CentriMo) to compare the enrichment of UniPROBE motifs with in vivo derived motifs (from the JASPAR CORE database) in their respective TF ChIP-seq peak region. CentriMo is applied to 14 TF ChIP-seq peak regions from different cell lines. We aim to establish if there is a relationship between the occurrences of UniPROBE 8-mer patterns in likely binding regions and their enrichment score and how well the in vitro derived motifs match in vivo binding specificity. We did not come out with a particular trend showing failure of the PBM approach to predict in vivo binding specificity. Our results show Ets1, Hnf4a and Tcf3 show prediction failure by the PBM technique in terms of our Spearman’s rank correlation for ChIP-seq data and central motif enrichment analysis. However, the PBM technique also matched the in vivo binding specificities of FoxA2, Pou2f2 and Mafk. Failure of the PBM approach was found to be a result of variability in the TF’s binding specificity, the presence of cofactors, narrow binding specificity and the presence ubiquitous binding patterns.
- Full Text:
- Date Issued: 2015
- Authors: Matereke, Lavious Tapiwa
- Date: 2015
- Subjects: Transcription factors , Protein binding , DNA-binding proteins , Chromatin , Protein microarrays
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:4161 , http://hdl.handle.net/10962/d1018666
- Description: A transcription factor (TF) is a protein that binds to specific DNA sequences as part of the initiation stage of transcription. Various methods of finding these transcription factor binding sites (TFBS) have been developed. In vivo technologies analyze DNA binding regions known to have bound to a TF in a living cell. Most widely used in vivo methods at the moment are chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) and DNase I hypersensitive sites sequencing. In vitro methods derive TFBS based on experiments with TFs and DNA usually in artificial settings or computationally. An example is the Protein Binding Microarray which uses artificially constructed DNA sequences to determine the short sequences that are most likely to bind to a TF. The major drawback of this approach is that binding of TFs in vivo is also dependent on other factors such as chromatin accessibility and the presence of cofactors. Therefore TFBS derived from the PBM technique might not resemble the true DNA binding sequences. In this work, we use PBM data from the UniPROBE motif database, ChIP-seq data and DNase I hypersensitive sites data. Using the Spearman’s rank correlation and area under receiver operating characteristic curve, we compare the enrichment scores which the PBM approach assigns to its identified sequences and the frequency of these sequences in likely binding regions and the human genome as a whole. We also use central motif enrichment analysis (CentriMo) to compare the enrichment of UniPROBE motifs with in vivo derived motifs (from the JASPAR CORE database) in their respective TF ChIP-seq peak region. CentriMo is applied to 14 TF ChIP-seq peak regions from different cell lines. We aim to establish if there is a relationship between the occurrences of UniPROBE 8-mer patterns in likely binding regions and their enrichment score and how well the in vitro derived motifs match in vivo binding specificity. We did not come out with a particular trend showing failure of the PBM approach to predict in vivo binding specificity. Our results show Ets1, Hnf4a and Tcf3 show prediction failure by the PBM technique in terms of our Spearman’s rank correlation for ChIP-seq data and central motif enrichment analysis. However, the PBM technique also matched the in vivo binding specificities of FoxA2, Pou2f2 and Mafk. Failure of the PBM approach was found to be a result of variability in the TF’s binding specificity, the presence of cofactors, narrow binding specificity and the presence ubiquitous binding patterns.
- Full Text:
- Date Issued: 2015
A central enrichment-based comparison of two alternative methods of generating transcription factor binding motifs from protein binding microarray data
- Authors: Mahaye, Ntombikayise
- Date: 2013 , 2013-03-13
- Subjects: Transcription factors , Bioinformatics , Protein binding , Protein microarrays , Cell lines
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:3890 , http://hdl.handle.net/10962/d1003049 , Transcription factors , Bioinformatics , Protein binding , Protein microarrays , Cell lines
- Description: Characterising transcription factor binding sites (TFBS) is an important problem in bioinformatics, since predicting binding sites has many applications such as predicting gene regulation. ChIP-seq is a powerful in vivo method for generating genome-wide putative binding regions for transcription factors (TFs). CentriMo is an algorithm that measures central enrichment of a motif and has previously been used as motif enrichment analysis (MEA) tool. CentriMo uses the fact that ChIP-seq peak calling methods are likely to be biased towards the centre of the putative binding region, at least in cases where there is direct binding. CentriMo calculates a binomial p-value representing central enrichment, based on the central bias of the binding site with the highest likelihood ratio. In cases where binding is indirect or involves cofactors, a more complex distribution of preferred binding sites may occur but, in many cases, a low CentriMo p-value and low width of maximum enrichment (about 100bp) are strong evidence that the motif in question is the true binding motif. Several other MEA tools have been developed, but they do not consider motif central enrichment. The study investigates the claim made by Zhao and Stormo (2011) that they have identified a simpler method than that used to derive the UniPROBE motif database for creating motifs from protein binding microarray (PBM) data, which they call BEEML-PBM (Binding Energy Estimation by Maximum Likelihood-PBM). To accomplish this, CentriMo is employed on 13 motifs from both motif databases. The results indicate that there is no conclusive difference in the quality of motifs from the original PBM and BEEML-PBM approaches. CentriMo provides an understanding of the mechanisms by which TFs bind to DNA. Out of 13 TFs for which ChIP-seq data is used, BEEML-PBM reports five better motifs and twice it has not had any central enrichment when the best PBM motif does. PBM approach finds seven motifs with better central enrichment. On the other hand, across all variations, the number of examples where PBM is better is not high enough to conclude that it is overall the better approach. Some TFs bind directly to DNA, some indirect or in combination with other TFs. Some of the predicted mechanisms are supported by literature evidence. This study further revealed that the binding specificity of a TF is different in different cell types and development stages. A TF is up-regulated in a cell line where it performs its biological function. The discovery of cell line differences, which has not been done before in any CentriMo study, is interesting and provides reasons to study this further.
- Full Text:
- Date Issued: 2013
- Authors: Mahaye, Ntombikayise
- Date: 2013 , 2013-03-13
- Subjects: Transcription factors , Bioinformatics , Protein binding , Protein microarrays , Cell lines
- Language: English
- Type: Thesis , Masters , MSc
- Identifier: vital:3890 , http://hdl.handle.net/10962/d1003049 , Transcription factors , Bioinformatics , Protein binding , Protein microarrays , Cell lines
- Description: Characterising transcription factor binding sites (TFBS) is an important problem in bioinformatics, since predicting binding sites has many applications such as predicting gene regulation. ChIP-seq is a powerful in vivo method for generating genome-wide putative binding regions for transcription factors (TFs). CentriMo is an algorithm that measures central enrichment of a motif and has previously been used as motif enrichment analysis (MEA) tool. CentriMo uses the fact that ChIP-seq peak calling methods are likely to be biased towards the centre of the putative binding region, at least in cases where there is direct binding. CentriMo calculates a binomial p-value representing central enrichment, based on the central bias of the binding site with the highest likelihood ratio. In cases where binding is indirect or involves cofactors, a more complex distribution of preferred binding sites may occur but, in many cases, a low CentriMo p-value and low width of maximum enrichment (about 100bp) are strong evidence that the motif in question is the true binding motif. Several other MEA tools have been developed, but they do not consider motif central enrichment. The study investigates the claim made by Zhao and Stormo (2011) that they have identified a simpler method than that used to derive the UniPROBE motif database for creating motifs from protein binding microarray (PBM) data, which they call BEEML-PBM (Binding Energy Estimation by Maximum Likelihood-PBM). To accomplish this, CentriMo is employed on 13 motifs from both motif databases. The results indicate that there is no conclusive difference in the quality of motifs from the original PBM and BEEML-PBM approaches. CentriMo provides an understanding of the mechanisms by which TFs bind to DNA. Out of 13 TFs for which ChIP-seq data is used, BEEML-PBM reports five better motifs and twice it has not had any central enrichment when the best PBM motif does. PBM approach finds seven motifs with better central enrichment. On the other hand, across all variations, the number of examples where PBM is better is not high enough to conclude that it is overall the better approach. Some TFs bind directly to DNA, some indirect or in combination with other TFs. Some of the predicted mechanisms are supported by literature evidence. This study further revealed that the binding specificity of a TF is different in different cell types and development stages. A TF is up-regulated in a cell line where it performs its biological function. The discovery of cell line differences, which has not been done before in any CentriMo study, is interesting and provides reasons to study this further.
- Full Text:
- Date Issued: 2013
- «
- ‹
- 1
- ›
- »