Computational prediction of host-pathogen protein-protein interactions

dc.contributor.advisorChristo els, Alan
dc.contributor.advisorWitbooi, Peter
dc.contributor.authorAhmed, Ibrahim H.I.
dc.date.accessioned2017-10-25T12:52:36Z
dc.date.accessioned2024-05-17T07:57:46Z
dc.date.available2017-10-25T12:52:36Z
dc.date.available2024-05-17T07:57:46Z
dc.date.issued2017
dc.descriptionPhilosophiae Doctor - PhDen_US
dc.description.abstractSupervised machine learning approaches have been applied successfully to the prediction of protein-protein interactions (PPIs) within a single organism, i.e., intra-species predictions. However, because of the absence of large amounts of experimentally validated PPIs data for training and testing, fewer studies have successfully applied these techniques to host-pathogen PPI, i.e., inter-species comparisons. Among the host-pathogen studies, most of them have focused on human-virus interactions and specifically human-HIV PPI data. Additional improvements to machine learning techniques and feature sets are important to improve the classification accuracy for host-pathogen protein-protein interactions prediction. The primary aim of this bioinformatics thesis was to develop a binary classifier with an appropriate feature set for host-pathogen protein-protein interaction prediction using published human-Hepatitis C virus PPI, and to test the model on available host-pathogen data for human-Bacillus anthracis PPI. Twelve different feature sets were compared to find the optimal set. The feature selection process reveals that our novel quadruple feature (a subsequence of four consecutive amino acid) combined with sequence similarity and human interactome network properties (such as degree, cluster coefficient, and betweenness centrality) were the best set. The optimal feature set outperformed those in the relevant published material, giving 95.9% sensitivity, 91.6% specificity and 89.0% accuracy. Using our optimal features set, we developed a neural network model to predict PPI between human-Mycobacterium tuberculosis. The strategy is to develop a model trained with intra-species PPI data and extend it to inter-species prediction. However, the lack of experimentally validated PPI data between human-Mycobacterium tuberculosis (Mtuberculosis), leads us to first assess the feasibility of using validated intra-species PPI data to build a model for inter-species PPI. In this model we used human intra-species PPI combined with Bacillus anthracis intra-species data to develop a binary classification model and extend the model for human-Bacillus anthracis inter-species prediction. Thus, we test our hypotheses on known human-Bacillus anthracis PPI data and the result shows good performance with 89.0% as average accuracy. The same approach was extended to the prediction of PPI between human-Mycobacterium tuberculosis. The predicted human-M-tuberculosis PPI data were further validated using functional enrichment of experimentally verified secretory proteins in M-tuberculosis, cellular compartment analysis and pathway enrichment analysis. Results show that five of the M-tuberculosis secretory proteins within an infected host macrophage that correspond to the mycobacterial virulent strain H37Rv were extracted from the human-M- tuberculosis PPI dataset predicted by our model. Finally, a web server was created to predict PPIs between human and Mycobacterium tuberculosis which is available online at URL:http://hppredict.sanbi.ac.za. In summary, the concepts, techniques and technologies developed as part of this thesis have the potential to contribute not only to the understanding PPI analysis between human and Mycobacterium tuberculosis, but can be extended to other pathogens. Further materials related to this study are available at ftp://ftp.sanbi.ac.za/machine learning.en_US
dc.description.sponsorshipNational Research Foundation (NRF) and SANBIen_US
dc.identifier.urihttps://hdl.handle.net/10566/15269
dc.language.isoenen_US
dc.publisherUniversity of the Western Capeen_US
dc.rights.holderUniversity of the Western Capeen_US
dc.subjectMycobacterium tuberculosisen_US
dc.subjectBacillus anthracisen_US
dc.subjectProtein-protein interactionsen_US
dc.subjectMachine learningen_US
dc.subjectSupport vector machinesen_US
dc.titleComputational prediction of host-pathogen protein-protein interactionsen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Ahmed_ihi_phd_ns_2017.pdf
Size:
10.14 MB
Format:
Adobe Portable Document Format
Description:
PhD
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.62 KB
Format:
Plain Text
Description: