Philosophiae Doctor - PhD (Bioinformatics)
Permanent URI for this collection
Browse
Browsing by Author "Christoffels, Alan"
Now showing 1 - 20 of 20
Results Per Page
Sort Options
Item Baobab LIMS: An open source biobank laboratory information management system for resource-limited settings(University of the Western Cape, 2019) Bendou, Hocine; Christoffels, AlanA laboratory information management system (LIMS) is central to the informatics infrastructure that underlies biobanking activities. To date, a wide range of commercial and open source LIMS are available. The decision to opt for one LIMS over another is often influenced by the needs of the biobank clients and researchers, as well as available financial resources. However, to find a LIMS that incorporates all possible requirements of a biobank may often be a complicated endeavour. The need to implement biobank standard operation procedures as well as stimulate the use of standards for biobank data representation motivated the development of Baobab LIMS, an open source LIMS for Biobanking. Baobab LIMS comprises modules for biospecimen kit assembly, shipping of biospecimen kits, storage management, analysis requests, reporting, and invoicing. Baobab LIMS is based on the Plone web-content management framework, a server-client-based system, whereby the end user is able to access the system securely through the internet on a standard web browser, thereby eliminating the need for standalone installations on all machines. The Baobab LIMS components were tested and evaluated in three human biobanks. The testing of the LIMS modules aided in the mapping of the biobanks requirements to the LIMS functionalities, and furthermore, it helped to reveal new user suggestions, such as the enhancement of the online documentation. The user suggestions are demonstrated to be important for both LIMS strengthen and biobank sustainability. Ultimately, the practical LIMS evaluations showed the ability of Boabab LIMS to be used in the management of human biobanks operations of relatively different biobanking workflows.Item Coding of tsetse repellents by olfactory sensory neurons: towards the improvement and the development of novel(University of the Western Cape, 2020) Souleymane, Diallo; Christoffels, AlanTsetse flies are the biological vectors of human and animal trypanosomiasis and hence representant medical and veterinary importance. The sense of smell plays a significant role in tsetse and its ecological interaction, such as finding blood meal source, resting, and larvicidal sites and for mating. Tsetse olfactory behaviour can be exploited for their management; however, olfactory studies in tsetse flies are still fragmentary. Here in my PhD thesis, using scanning electron microscopy, electrophysiology, behaviour, bioinformatics and molecular biology techniques, I have investigated tsetse flies (Glossina fuscipes fuscipes) olfaction using behaviourally well studied odorants, tsetse repellent by comparing with attractant odour. Insect olfaction is mediated by olfactory sensory neurons (OSNs), located in olfactory sensilla, which are cuticular structures exposed to the environment through pore and create a platform for chemical communication. In the sensilla shaft the dendrite of OSNs are housed, which are protected by called the sensillum lymph produced by support cells and contains a variety of olfactory proteins, including the odorant binding protein (OBP) and chemosensory proteins (CSP). While on the dendrite of OSNs are expressed olfactory receptors. In my PhD, studies I tried to decipher the sense of smell in tsetse fly. In the second chapter, I demonstrated that G. f. fuscipes is equipped with diverse olfactory sensilla, that various from basiconic, trichoid and coeloconic. I also demonstrated, there is shape, length, number difference between sensilla types and sexual dimorphism. There is a major difference between male and female, while male has the unique basiconic sensilla, club shaped found in the pits, which is absent from female pits. In my third chapter, I investigated the odorant receptors which are expressed on the dendrite of the olfactory sensory neurons (OSNs). G. f. fuscipes has 42 ORs, which were not functionally characterised. I used behaviourally well studied odorants, tsetse repellents, composed of four components blend. I demonstrated that tsetse repellent is also a strong antifeedant for both G. pallidipes and G. f. fuscipes using feeding bioassays as compared to the attractant odour, adding the value of tsetse repellent. However, the attractant odour enhanced the feeding index. Using DREAM (deorphanization of receptors based on expression alterations of mRNA levels). I found that in G. f. fuscipes, following a short in vivo exposure to the individual tsetse repellent component as well as an attractant volatile chemical, OSNs that respond to these compounds altered their mRNA expression in two opposite direction, significant downregulation and upregulation in their number of transcripts corresponding to the OR that they expressed and interacted with odorant. Also, I found that the odorants with opposite valence already segregate distinctly at the cellular and molecular target at the periphery, which is the reception of odorants by OSNs, which is the basis of sophisticated olfactory behaviour. Deorphanization of ORs in none model insect is a challenge, here by combining DREAM with molecular dynamics, as docking score, physiology and homology modelling with Drosophila a well-studied model insects, I was able to predict putative receptors of the tsetse repellent components and an attractant odour. However, many ORs were neutral, showing they were not activated by the odorants, demonstrating the selectivity of the technique as well as the receptors. In my fourth chapter, I investigated the OBPs structures and their interaction with odorants molecules. I demonstrated that OBPs are expressed both in the antenna, as well as in other tissues, such as legs. I also demonstrated that there are variations in the expression of OBPs between tissues as well as sexes. I also demonstrated that odorants induced a fast alteration in OBP mRNA expression, some odorants induced a decrease in the transcription of genes corresponding to the activated OBP and others increased the expression by many fold in OBPs in live insect, others were neutral after 5 hours of exposure. Moreover, with subsequent behavioural data showed that the behavioural response of G. f. fuscipes toward 1-octen-3-ol decreased significantly when 1-octen-3-ol putative OBPs were silenced with feeding of double-stranded RNA (dsRNA). In summary, our finding whereby odorant exposure affects the OBPs mRNA, their physiochemical properties and the silencing of these OBPs affected the behavioural response demonstrate that the OBPs are involved in odour detection that affect the percept of the given odorant. The expression of OBPs in olfactory tissues, antenna and their interaction with odorant and their effect on behavioural response when silenced shows their direct involvement in odour detection and reception. Furthermore, their expression in other tissues such as legs indicates they might also have role in other physiological functions, such as taste.Item Coding of tsetse repellents by olfactory sensory neurons: towards the improvement and the development of novel tsetse repellents(University of the Western Cape, 2020) Souleymane, Diallo; Christoffels, AlanTsetse flies are the biological vectors of human and animal trypanosomiasis and hence representant medical and veterinary importance. The sense of smell plays a significant role in tsetse and its ecological interaction, such as finding blood meal source, resting, and larvicidal sites and for mating. Tsetse olfactory behaviour can be exploited for their management; however, olfactory studies in tsetse flies are still fragmentary. Here in my PhD thesis, using scanning electron microscopy, electrophysiology, behaviour, bioinformatics and molecular biology techniques, I have investigated tsetse flies (Glossina fuscipes fuscipes) olfaction using behaviourally well studied odorants, tsetse repellent by comparing with attractant odour. Insect olfaction is mediated by olfactory sensory neurons (OSNs), located in olfactory sensilla, which are cuticular structures exposed to the environment through pore and create a platform for chemical communication. In the sensilla shaft the dendrite of OSNs are housed, which are protected by called the sensillum lymph produced by support cells and contains a variety of olfactory proteins, including the odorant binding protein (OBP) and chemosensory proteins (CSP). While on the dendrite of OSNs are expressed olfactory receptors. In my PhD, studies I tried to decipher the sense of smell in tsetse fly. In the second chapter, I demonstrated that G. f. fuscipes is equipped with diverse olfactory sensilla, that various from basiconic, trichoid and coeloconic. I also demonstrated, there is shape, length, number difference between sensilla types and sexual dimorphism. There is a major difference between male and female, while male has the unique basiconic sensilla, club shaped found in the pits, which is absent from female pits. In my third chapter, I investigated the odorant receptors which are expressed on the dendrite of the olfactory sensory neurons (OSNs). G. f. fuscipes has 42 ORs, which were not functionally characterised. I used behaviourally well studied odorants, tsetse repellents, composed of four components blend. I demonstrated that tsetse repellent is also a strong antifeedant for both G. pallidipes and G. f. fuscipes using feeding bioassays as compared to the attractant odour, adding the value of tsetse repellent. However, the attractant odour enhanced the feeding index. Using DREAM (deorphanization of receptors based on expression alterations of mRNA levels). I found that in G. f. fuscipes, following a short in vivo exposure to the individual tsetse repellent component as well as an attractant volatile chemical, OSNs that respond to these compounds altered their mRNA expression in two opposite direction, significant downregulation and upregulation in their number of transcripts corresponding to the OR that they expressed and interacted with odorant. Also, I found that the odorants with opposite valence already segregate distinctly at the cellular and molecular target at the periphery, which is the reception of odorants by OSNs, which is the basis of sophisticated olfactory behaviour. Deorphanization of ORs in none model insect is a challenge, here by combining DREAM with molecular dynamics, as docking score, physiology and homology modelling with Drosophila a well-studied model insects, I was able to predict putative receptors of the tsetse repellent components and an attractant odour. However, many ORs were neutral, showing they were not activated by the odorants, demonstrating the selectivity of the technique as well as the receptors. In my fourth chapter, I investigated the OBPs structures and their interaction with odorants molecules. I demonstrated that OBPs are expressed both in the antenna, as well as in other tissues, such as legs. I also demonstrated that there are variations in the expression of OBPs between tissues as well as sexes. I also demonstrated that odorants induced a fast alteration in OBP mRNA expression, some odorants induced a decrease in the transcription of genes corresponding to the activated OBP and others increased the expression by many fold in OBPs in live insect, others were neutral after 5 hours of exposure. Moreover, with subsequent behavioural data showed that the behavioural response of G. f. fuscipes toward 1-octen-3-ol decreased significantly when 1-octen-3-ol putative OBPs were silenced with feeding of double-stranded RNA (dsRNA). In summary, our finding whereby odorant exposure affects the OBPs mRNA, their physiochemical properties and the silencing of these OBPs affected the behavioural response demonstrate that the OBPs are involved in odour detection that affect the percept of the given odorant. The expression of OBPs in olfactory tissues, antenna and their interaction with odorant and their effect on behavioural response when silenced shows their direct involvement in odour detection and reception. Furthermore, their expression in other tissues such as legs indicates they might also have role in other physiological functions, such as taste.Item Coding of tsetse repellents by olfactory sensory neurons: towards the improvement and the development of novel tsetse repellents(University of Western Cape, 2021) Souleymane, Diallo; Christoffels, AlanTsetse flies are the biological vectors of human and animal trypanosomiasis and hence representant medical and veterinary importance. The sense of smell plays a significant role in tsetse and its ecological interaction, such as finding blood meal source, resting, and larvicidal sites and for mating. Tsetse olfactory behaviour can be exploited for their management; however, olfactory studies in tsetse flies are still fragmentary. Here in my PhD thesis, using scanning electron microscopy, electrophysiology, behaviour, bioinformatics and molecular biology techniques, I have investigated tsetse flies (Glossina fuscipes fuscipes) olfaction using behaviourally well studied odorants, tsetse repellent by comparing with attractant odour. Insect olfaction is mediated by olfactory sensory neurons (OSNs), located in olfactory sensilla, which are cuticular structures exposed to the environment through pore and create a platform for chemical communication.Item Computational characterisation of DNA methylomes in mycobacterium tuberculosis Beijing hyper- and hypo-virulent strains(University of the Western Cape, 2014) Naidu, Alecia Geraldine; Christoffels, Alan; Gey van Pittius, NicoMycobacterium tuberculosis, the causative agent of tuberculosis, is estimated to infect approximately one-third of the world’s population and is responsible for around 2 million deaths per year. The disease is endemic in South Africa which has one of the world’s highest tuberculosis incidence and death rates. The M. tuberculosis Beijing genotype are characterised by having an enhanced virulence capability over other M. tuberculosis strains and are the predominant strain observed in the Western Cape of South Africa. DNA methylation is a largely untapped area of research in M.tuberculosis and has been poorly described in the literature especially given its connection to virulence despite it being well characterised along with its role in virulence in other pathogenic bacteria such as E.coli. The overall aim was to characterise a global DNA methylation profile for two M. tuberculosis Beijing strains, hyper-virulent and hypo-virulent, using single molecule real time sequencing data technology. Moreover, to determine if adenine methylation in promoter regions has a possible functional role. This study identified and characterised the DNA methylation profile at the single nucleotide resolution in these strains using Pacific Biosciences single molecule real time sequencing data. A computational approach was used to discern DNA methylation patterns between the hyper and hypo-virulent strains with a view of understanding virulence in the hyper-virulent strain. Methylated motifs, which belong to known Restriction Modification (RM) systems of the H37Rv referencegenome were also identified. N6-methyladenine (m6A) and N4-methlycytosine (m4C) loci were identified in both strains. m6A were idenitified in both strains occuring within the following sequence motifs CACGCAG (Type II RM system), GATNNNNRTAC/GTAYNNNNATC (Type I RM system), while the CTGGAGGA motif was found to be uniquley methylated in the hyper-virulentstrain.Interestingly, the CACGCAG motif was significantly methylated (p = 9.9 x10 -63) at a higher proportion in intergenic regions (~70%) as opposed to genic regions in both the hyper-virulent and hypo-virulent strains suggesting a role in gene regulation. There appeared to be a higher proportion of m6A occuring in intergenic regions compared to within genes for hyper-virulent (61%) and hypo-virulent (62%) strains. The genic proportion revealed that 35% of total m6A occurred uniquely within genes for the hyper-virulent strain while 27.9% for uniquely methylated genes in hypo-virulent strain.Item Computational characterization of IRE-regulated genes in Glossina morsitans(University of Western Cape, 2013) Dashti, Zahra Jalali Sefid; Christoffels, AlanBlood feeding is a habit exhibited by many insects. Considering the devastating impact of these insects on human health, it is important to focus research on understanding the biology behind blood-feeding, disease transmission and host-pathogen interactions. Such knowledge would pave the way for developing efficient preventative measures. Iron an important element for species survival, is at the center of events controlling tsetse’s fitness and reproductive success. Hence, targeting genes involved in iron trafficking and sequestration would present possible means of preventing disease transmission. Considering the dynamic and multi-factorial nature of iron metabolism, a well-coordinated regulatory system is expected to be at work. Despite extensive literature on the mechanism of iron regulation and key factors responsible in maintaining its homeostasis in human, less attention has been given to understand such system in insects, especially the blood-feeding insects. The availability of the genome sequences for several insect disease vectors allows for a more detailed analysis on the identification and characterization of events controlling and preventing iron-induced toxicity following a blood-meal. The International Glossina Genome Initiative (IGGI) has coordinated the sequencing and annotation of the Glossina morsitans genome that has led to the identification of 12220 genes. This knowledge-base along with current understanding of the IRE system in regulating iron metabolism, allowed for investigating the UTRs of Glossina genes for the presence of these elements. Using a combination of motif enrichment and IRE-stem loop structure prediction, an IRE-mediated regulation was inferred for 150 genes, among which, 72 were identified with 5’-IREs and 78 with 3’-IREs. Of the identified IRE-regulated genes, the ferritin heavy chain and MRCK-alpha are the only known genes to have IREs, while the rest are novel genes for which putative roles in regulating iron levels in tsetse fly have been assigned in this study. Moreover, the functional inference of the identified genes further points to the enrichment of transcription and translation. Furthermore, several hypothetical proteins with no defined functions were identified to be IRE-regulated. These include TMP007137, TMP009128, TMP002546, TMP002921, TMP003628, TMP004581, TMP008259, TMP012389, TMP005219, TMP005827, TMP007908, TMP009332, TMP01- 3384, TMP009102, TMP010544, TMP010707, TMP004292, TMP006517, TMP014030, TMP009821 and TMP003060 for which an iron-regulatory mechanism of action may be inferred. We further report 26 IRE-regulated secreted proteins in Glossina, that present good candidates for further investigation pertaining to the development of novel vector control strategies. Using the predicted data on the identified IRE-regulated genes and their functional classification, we derived at 29 genes with putative roles in iron trafficking, where several unknown and hypothetical proteins are included. Thus a novel role is inferred for these genes in cellular binding and transport in the context of iron metabolism. It is therefore possible that these genes may have evolved in Glossina, such that they compensate for the absence of an IRE- regulated mechanism for transferrin. Additionally, we propose 14 IRE-regulated genes involved in immune and stress response, which may indeed play crucial roles at the host pathogen interface through their possible mechanisms of iron sequestration. Using the subcellular localization analysis, we further categorized the putative IRE regulated genes into several subcellular localizations, where the majority of genes were found within the nucleus and the cytosol. The detection of the conserved motifs in a set of genes, is an interesting yet sophisticated area of research, that allows for identifying either co-regulated or orthologous genes, while further providing support for the putative function of a set of genes that would otherwise remain uncharacterized. This is based on the notion that co-regulated genes are often coexpressed to carry out a specific function. As such, 14 regulatory elements were identified in the 5’- and 3’-UTRs of IRE-regulated genes, involved in embryonic development and reproduction, inflammation and immune response, signaling pathways and neurogenesis as well as DNA repair. This study further proposes several IRE-regulated genes as targets for micro-RNA regulation through identifying micro-RNA binding sites in their 3’UTRs. Using a motif clustering approach we clustered IRE-regulated genes based on the number of motifs they share. Significantly co-regulated genes sharing two or more motifs were determined as critical targets for future investigation. The expression map of IRE-regulated genes was analyzed to better understand the events taking place from 3 hours to 15 days following a blood meal. Re-analysis of Anopheles microarray chip showed the significant expression of three cell envelope and transport genes as early response and six as late response to a blood meal, which could indeed be assigned a putative role in iron trafficking. Genes identified in this study with implications in iron metabolism, whose timely expression allows for maintaining iron homeostasis, represent good targets for future work. Considering the important role of evolution in species adaptation to habits such as Hematophagy, it is of importance to identify evolutionary signatures associated with these changes. To distinguish between evolutionary forces that are specific to iron-metabolism in blood-feeding insects and those that are found in other insects, the IRE-regulated genes were clustered into orthologous groups using several blood feeding and non-blood feeding insect species. Assessment of different evolutionary scenarios using the Maximum Likelihood (ML) approach, points to variations in the evolution of IRE-regulated genes between the two insect groups, whereby several genes indicate an increased mutation rate in the BF-insect group relative to their non-blood feeding insect counterparts. These include TMP003602 (phosphoinositide3-kinase), TMP009157 (ubiquitin-conjugating enzyme9), TMP010317 (general transcription factor IIH subunit1), TMP011104 (serine-pyruvate mitochondrial), TMP013137 (pentatricopeptide Transcription and translation), TMP013886 (tRNA(uridine-2-o-)-methyl-transferase-trm7) and TMP014187 (mediator 100kD). Additionally, we have indicated the presence of positively selected sites within seven blood-feeding IRE-regulated genes namely TMP002520 (nucleoporin), TMP008942 (eukaryotic translation initiation factor 3), TMP009871(bruno-3 transcript) , TMP010317 (general transcription factor IIH subunit1), TMP010673 (ferritin heavy-chain protein), TMP011104 (serine-pyruvate mitochondrial) and TMP011448 (brain chitinase and chia). Thus the results of this study provides an in depth understanding of iron metabolism in Glossina morsitans and confers important targets for future validations based on which innovative control strategies may be designed.Item Computational strategies to identify, prioritize and design potential antimalarial agents from natural products(University of the Western Cape, 2015) Egieyeh, Samuel Ayodele; Christoffels, Alan; Malan, Sarel; Syce, JamesIntroduction: There is an exigent need to develop novel antimalarial drugs in view of the mounting disease burden and emergent resistance to the presently used drugs against the malarial parasites. A large amount of natural products, especially those used in ethnomedicine for malaria, have shown varying in-vitro antiplasmodial activities. Facilitating antimalarial drug development from this wealth of natural products is an imperative and laudable mission to pursue. However, the limited resources, high cost, low prospect and the high cost of failure during preclinical and clinical studies might militate against pursue of this mission. Chemoinformatics techniques can simulate and predict essential molecular properties required to characterize compounds thus eliminating the cost of equipment and reagents to conduct essential preclinical studies, especially on compounds that may fail during drug development. Therefore, applying chemoinformatics techniques on natural products with in-vitro antiplasmodial activities may facilitate identification and prioritization of these natural products with potential for novel mechanism of action, desirable pharmacokinetics and high likelihood for development into antimalarial drugs. In addition, unique structural features mined from these natural products may be templates to design new potential antimalarial compounds. Method: Four chemoinformatics techniques were applied on a collection of selected natural products with in-vitro antiplasmodial activity (NAA) and currently registered antimalarial drugs (CRAD): molecular property profiling, molecular scaffold analysis, machine learning and design of a virtual compound library. Molecular property profiling included computation of key molecular descriptors, physicochemical properties, molecular similarity analysis, estimation of drug-likeness, in-silico pharmacokinetic profiling and exploration of structure-activity landscape. Analysis of variance was used to assess statistical significant differences in these parameters between NAA and CRAD. Next, molecular scaffold exploration and diversity analyses were performed on three datasets (NAA, CRAD and malarial data from Medicines for Malarial Ventures (MMV)) using scaffold counts and cumulative scaffold frequency plots. Scaffolds from the NAA were compared to those from CRAD and MMV. A Scaffold Tree was also generated for all the datasets. Thirdly, machine learning approaches were used to build four regression and four classifier models from bioactivity data of NAA using molecular descriptors and molecular fingerprints. Models were built and refined by leave-one-out cross-validation and evaluated with an independent test dataset. Applicability domain (AD), which defines the limit of reliable predictability by the models, was estimated from the training dataset and validated with the test dataset. Possible chemical features associated with reported antimalarial activities of the compounds were also extracted. Lastly, virtual compound libraries were generated with the unique molecular scaffolds identified from the NAA. The virtual compounds generated were characterized by evaluating selected molecular descriptors, toxicity profile, structural diversity from CRAD and prediction of antiplasmodial activity. Results: From the molecular property profiling, a total of 1040 natural products were selected and a total of 13 molecular descriptors were analyzed. Significant differences were observed between the natural products with in-vitro antiplasmodial activities (NAA) and currently registered antimalarial drugs (CRAD) for at least 11 of the molecular descriptors. Molecular similarity and chemical space analysis identified NAA that were structurally diverse from CRAD. Over 50% of NAA with desirable drug-like properties were identified. However, nearly 70% of NAA were identified as potentially "promiscuous" compounds. Structure-activity landscape analysis highlighted compound pairs that formed "activity cliffs". In all, prioritization strategies for the natural products with in-vitro antiplasmodial activities were proposed. The scaffold exploration and analysis results revealed that CRAD exhibited greater scaffold diversity, followed by NAA and MMV respectively. Unique scaffolds that were not contained in any other compounds in the CRAD datasets were identified in NAA. The Scaffold Tree showed the preponderance of ring systems in NAA and identified virtual scaffolds, which maybe potential bioactive compounds or elucidate the NAA possible synthetic routes. From the machine learning study, the regression and classifier models that were most suitable for NAA were identified as model tree M5P (correlation coefficient = 0.84) and Sequential Minimization Optimization (accuracy = 73.46%) respectively. The test dataset fitted into the applicability domain (AD) defined by the training dataset. The “amine” group was observed to be essential for antimalarial activity in both NAA and MMV dataset but hydroxyl and carbonyl groups may also be relevant in the NAA dataset. The results of the characterization of the virtual compound library showed significant difference (p value < 0.05) between the virtual compound library and currently registered antimalarial drugs in some molecular descriptors (molecular weight, log partition coefficient, hydrogen bond donors and acceptors, polar surface area, shape index, chiral centres, and synthetic feasibility). Tumorigenic and mutagenic substructures were not observed in a large proportion (> 90%) of the virtual compound library. The virtual compound libraries showed sufficient diversity in structures and majority were structurally diverse from currently registered antimalarial drugs. Finally, up to 70% of the virtual compounds were predicted as active antiplasmodial agents. Conclusions:Molecular property profiling of natural products with in-vitro antiplasmodial activities (NAA) and currently registered antimalarial drugs (CRAD) produced a wealth of information that may guide decisions and facilitate antimalarial drug development from natural products and led to a prioritized list of natural products with in-vitro antiplasmodial activities. Molecular scaffold analysis identified unique scaffolds and virtual scaffolds from NAA that possess desirable drug-like properties, which make them ideal starting points for molecular antimalarial drug design. The machine learning study built, evaluated and identified amply accurate regression and classifier accurate models that were used for virtual screening of natural compound libraries to mine possible antimalarial compounds without the expense of bioactivity assays. Finally, a good amount of the virtual compounds generated were structurally diverse from currently registered antimalarial drugs and potentially active antiplasmodial agents. Filtering and optimization may lead to a collection of virtual compounds with unique chemotypes that may be synthesized and added to screening deck against Plasmodium.Item Development of a comprehensive annotation and curation framework for analysis of Glossina Morsitans Morsitans expresses sequence tags(University of the Western Cape, 2011) Wamalwa, Mark; Christoffels, Alan; South African National Bioinformatics Institute (SANBI); Faculty of ScienceThis study has successfully identified transcripts differentially expressed in the salivary gland and midgut and provides candidate genes that are critical to response to parasite invasion. Furthermore, an open-source Glossina resource (G-ESTMAP) was developed that provides interactive features and browsing of functional genomics data for researchers working in the field of Trypanosomiasis on the African continent.Item Development of a hepatitis C virus knowledgebase with computational prediction of functional hypothesis of therapeutic relevance(2011) Samuel, Kojo Kwofie; Bajic, Vladimir; Christoffels, AlanTo ameliorate Hepatitis C Virus (HCV) therapeutic and diagnostic challenges requires robust intervention strategies, including approaches that leverage the plethora of rich data published in biomedical literature to gain greater understanding of HCV pathobiological mechanisms. The multitudes of metadata originating from HCV clinical trials as well as low and high-throughput experiments embedded in text corpora can be mined as data sources for the implementation of HCV-specific resources. HCV-customized resources may support the generation of worthy and testable hypothesis and reveal potential research clues to augment the pursuit of efficient diagnostic biomarkers and therapeutic targets. This research thesis report the development of two freely available HCV-specific web-based resources: (i) Dragon Exploratory System on Hepatitis C Virus (DESHCV) accessible via http://apps.sanbi.ac.za/DESHCV/ or http://cbrc.kaust.edu.sa/deshcv/ and(ii) Hepatitis C Virus Protein Interaction Database (HCVpro) accessible via http://apps.sanbi.ac.za/hcvpro/ or http://cbrc.kaust.edu.sa/hcvpro/.DESHCV is a text mining system implemented using named concept recognition and cooccurrence based approaches to computationally analyze about 32, 000 HCV related abstracts obtained from PubMed. As part of DESHCV development, the pre-constructed dictionaries of the Dragon Exploratory System (DES) were enriched with HCV biomedical concepts, including HCV proteins, name variants and symbols to enable HCV knowledge specific exploration. The DESHCV query inputs consist of user-defined keywords, phrases and concepts. DESHCV is therefore an information extraction tool that enables users to computationally generate association between concepts and support the prediction of potential hypothesis with diagnostic and therapeutic relevance.Additionally, users can retrieve a list of abstracts containing tagged concepts that can be used to overcome the herculean task of manual biocuration. DESHCV has been used to simulate previously reported thalidomide-chronic hepatitis C hypothesis and also to model a potentially novel thalidomide-amantadine hypothesis.HCVpro is a relational knowledgebase dedicated to housing experimentally detected HCV-HCV and HCV-human protein interaction information obtained from other databases and curated from biomedical journal articles. Additionally, the database contains consolidated biological information consisting of hepatocellular carcinoma(HCC) related genes, comprehensive reviews on HCV biology and drug development,functional genomics and molecular biology data, and cross-referenced links to canonical pathways and other essential biomedical databases. Users can retrieve enriched information including interaction metadata from HCVpro by using protein identifiers,gene chromosomal locations, experiment types used in detecting the interactions, PubMed IDs of journal articles reporting the interactions, annotated protein interaction IDs from external databases, and via “string searches”. The utility of HCVpro has been demonstrated by harnessing integrated data to suggest putative baseline clues that seem to support current diagnostic exploratory efforts directed towards vimentin. Furthermore,eight genes comprising of ACLY, AZGP1, DDX3X, FGG, H19, SIAH1, SERPING1 and THBS1 have been recommended for possible investigation to evaluate their diagnostic potential. The data archived in HCVpro can be utilized to support protein-protein interaction network-based candidate HCC gene prioritization for possible validation by experimental biologists.Item Development of a Hepatitis C Virus knowledgebase with computational prediction of functional hypothesis of therapeutic relevance(University of the Western Cape, 2011) Kojo, Kwofie Samuel; Bajic, Vladimir; Christoffels, Alan; South African National Bioinformatics Institute (SANBI)To ameliorate Hepatitis C Virus (HCV) therapeutic and diagnostic challenges requires robust intervention strategies, including approaches that leverage the plethora of rich data published in biomedical literature to gain greater understanding of HCV pathobiological mechanisms. The multitudes of metadata originating from HCV clinical trials as well as low and high-throughput experiments embedded in text corpora can be mined as data sources for the implementation of HCV-specific resources. HCV-customized resources may support the generation of worthy and testable hypothesis and reveal potential research clues to augment the pursuit of efficient diagnostic biomarkers and therapeutic targets. This research thesis report the development of two freely available HCV-specific web-based resources: (i) Dragon Exploratory System on Hepatitis C Virus (DESHCV) accessible via http://apps.sanbi.ac.za/DESHCV/ or http://cbrc.kaust.edu.sa/deshcv/ and (ii) Hepatitis C Virus Protein Interaction Database (HCVpro) accessible via http://apps.sanbi.ac.za/hcvpro/ or http://cbrc.kaust.edu.sa/hcvpro/. DESHCV is a text mining system implemented using named concept recognition and cooccurrence based approaches to computationally analyze about 32, 000 HCV related abstracts obtained from PubMed. As part of DESHCV development, the pre-constructed dictionaries of the Dragon Exploratory System (DES) were enriched with HCV biomedical concepts, including HCV proteins, name variants and symbols to enable HCV knowledge specific exploration. The DESHCV query inputs consist of user-defined keywords, phrases and concepts. DESHCV is therefore an information extraction tool that enables users to computationally generate association between concepts and support the prediction of potential hypothesis with diagnostic and therapeutic relevance. Additionally, users can retrieve a list of abstracts containing tagged concepts that can be used to overcome the herculean task of manual biocuration. DESHCV has been used to simulate previously reported thalidomide-chronic hepatitis C hypothesis and also to model a potentially novel thalidomide-amantadine hypothesis. HCVpro is a relational knowledgebase dedicated to housing experimentally detected HCV-HCV and HCV-human protein interaction information obtained from other databases and curated from biomedical journal articles. Additionally, the database contains consolidated biological information consisting of hepatocellular carcinoma (HCC) related genes, comprehensive reviews on HCV biology and drug development, functional genomics and molecular biology data, and cross-referenced links to canonical pathways and other essential biomedical databases. Users can retrieve enriched information including interaction metadata from HCVpro by using protein identifiers, gene chromosomal locations, experiment types used in detecting the interactions, PubMed IDs of journal articles reporting the interactions, annotated protein interaction IDs from external databases, and via “string searches”. The utility of HCVpro has been demonstrated by harnessing integrated data to suggest putative baseline clues that seem to support current diagnostic exploratory efforts directed towards vimentin. Furthermore, eight genes comprising of ACLY, AZGP1, DDX3X, FGG, H19, SIAH1, SERPING1 and THBS1 have been recommended for possible investigation to evaluate their diagnostic potential. The data archived in HCVpro can be utilized to support protein-protein interaction network-based candidate HCC gene prioritization for possible validation by experimental biologists.Item Development of an operon detection algorithm to analyze gene regulation in drug resistant Mycobacterium tuberculosis(University of the Western Cape, 2022) Calvert-Joshua, Tracey; Christoffels, AlanIn prokaryotes, operon structures often form to allow microorganisms to respond rapidly and efficiently to changing environmental conditions. Operons are sets of neighbouring genes which are co-regulated and co-transcribed. Studies have shown evidence of operons changing their lengths and/or maintaining their lengths while up- or downregulating their expression levels when exposed to various stresses. Since several operons have also been associated with drug resistance, having access to the operon map of Mycobacterium tuberculosis (Mtb), may give us insight into the existing mechanisms employed by Mtb to circumvent drug stress, and more importantly, it may allow us to target larger sections of a genome when designing antitubercular drugs.Item Evolution of HIV-1 subtype C gp120 envelope sequences in the female genital tract and blood plasma during acute and chronic infection(University of the Western Cape, 2014) Ramdayal, Kavisha; Harkins, Gordon; Christoffels, AlanHeterosexual transmission of HIV-1 via the female genital tract is the leading route of HIV infection in sub-Saharan Africa. Viruses then traffic between the cervical compartment and blood ensuring pervasive infection. Previous studies have however reported the existence of genetically diverse viral populations in various tissue types, each evolving under separate selective pressures within a single individual, though it is still unclear how compartmentalization dynamics change over acute and chronic infection in the absence of ARVs. To better characterize intrahost evolution and the movement of viruses between different anatomical tissue types, statistical and phylogenetic methods were used to reconstruct temporal dynamics between blood plasma and cervico-vaginal lavage (CVL) derived HIV-1 subtype C gp120 envelope sequences. A total of 206 cervical and 253 blood plasma sequences obtained from four treatment naïve women enrolled in the CAPRISA Acute Infection study cohort in South Africa were evaluated for evidence of genotypic and phenotypic differences between viral populations from each tissue type up to 3.6 years post-infection. Evidence for tissue-specific differences in genetic diversity, V-loop length variation, codon-based selection, co-receptor usage, hypermutation, recombination and potential N-linked glycosylation (PNLG) site accumulation were investigated. Of the four participants studied, two anonymously identified as CAP270 and CAP217 showed evidence of infection with a single HIV-1 variant, whereas CAP177 and CAP261 showed evidence of infection by more than one variant. As a result, genetic diversity, PNLGs accumulation and the number of detectable recombination events along the gp120 env region were lowest in the former patients and highest in the latter. Overall, genetic diversity increased over the course of infection in all participants and correlated significantly with viral load measurements from the blood plasma in one of the four participants tested (i.e. CAP177). Employing a structured coalescent model approach, rates of viral migration between anatomical tissue types on time-measured genealogies were also estimated. No persistent evidence for the existence of separate viral populations in the cervix and blood plasma was found in any of the participants and instead, sequences generally clustered together by time point on Bayesian Maximum Clade Credibility (MCC) trees. Clades that were monophyletic by tissue type comprised mostly of low diversity or monotypic sequences from the same time point, consistent with bursts of viral replication. Tissue-specific monophyletic clades also generally contained few sequences and were interspersed among sequences from both tissue-types. Tree and distance-based statistical tests were employed to further evaluate the degree to which cervical and blood plasma viruses clustered together on Bayesian MCC trees using the Slatkin-Maddison (S-M), Simmonds Association index (AI), Monophyletic Clade (MC), Wright’s measure of population subdivision (FST) and Hudson’s Nearest Neighbour (Snn) statistics, in the presence and absence of monotypic and low diversity sequences. Statistical evidence for the presence of tissue-specific population structure disappeared or was greatly reduced after the removal of monotypic and low diversity sequences, except in CAP177 and CAP217, in 3/5 of longitudinal tree and distance-based tests. Analysis of phenotypic differences between viral populations from the blood plasma and cervix revealed inconsistent tissue-specific patterns in genetic diversity, codon-based selection, co-receptor usage, hypermutation, recombination, V-loop length variation and PNLG site accumulation during acute and chronic infection among all participants. There is therefore no evidence to support the existence of distinct viral populations within the blood plasma and cervical compartments longitudinally, however slightly constrained populations may exist within the female genital tract at isolated time points, based on the statistical findings presented in this study.Item Generation of a human gene index and its application to disease candidacy(University of the Western Cape, 2001) Christoffels, Alan; Hide, Winston; Faculty of ScienceWith easy access to technology to generate expressed sequence tags (ESTs), several groups have sequenced from thousands to several thousands of ESTs. These ESTs benefit from consolidation and organization to deliver significant biological value. A number of EST projects are underway to extract maximum value from fragmented EST resources by constructing gene indices, where all transcripts are partitioned into index classes such that transcripts are put into the same index class if they represent the same gene. Therefore a gene index should ideally represent a non-redundant set of transcripts. Indeed, most gene indices aim to reconstruct the gene complement of a genome and their technological developments are directed at achieving this goal. The South African National Bioinformatics Institute (SANBI), on the other hand, embarked on the development of the sequence alignment and consensus knowledgebase (STACK) database that focused on the detection and visualisation of transcript variation in the context of developmental and pathological states, using all publicly available ESTs. Preliminary work on the STACK project employed an approach of partitioning the EST data into arbitrarily chosen tissue categories as a means of reducing the EST sequences to manageable sizes for subsequent processing. The tissue partitioning provided the template material for developing error-checking tools to analyse the information embedded in the error-laden EST sequences. However, tissue partitioning increases redundancy in the sequence data because one gene can be expressed in multiple tissues, with the result that multiple tissue partitioned transcripts will correspond to the same gene.Therefore, the sequence data represented by each tissue category had to be merged in order to obtain a comprehensive view of expressed transcript variation across all available tissues. The need to consolidate all EST information provided the impetus for developing a STACK human gene index, also referred to as a whole-body index. In this dissertation, I report on the development of a STACK human gene index represented by consensus transcripts where all constituent ESTs sample single or multiple tissues in order to provide the correct development and pathological context for investigating sequence variation. Furthermore, the availability of a human gene index is assessed as a diseasecandidate gene discovery resource. A feasible approach to construction of a whole-body index required the ability to process error-prone EST data in excess of one million sequences (1,198,607 ESTs as of December 1998). In the absence of new clustering algorithms, at that time, we successfully ported D2_CLUSTER, an EST clustering algorithm, to the high performance shared multiprocessor machine, Origin2000. Improvements to the parallelised version of D2_CLUSTER included: (i) ability to cluster sequences on as many as 126 processors. For example, 462000 ESTs were clustered in 31 hours on 126 R10000 MHz processors, Origin2000. (ii) enhanced memory management that allowed for clustering of mRNA sequences as long as 83000 base pairs. (iii) ability to have the input sequence data accessible to all processors, allowing rapid access to the sequences. (iv) a restart module that allowed a job to be restarted if it was interrupted. The successful enhancements to the parallelised version of D2_CLUSTER, as listed above, allowed for the processing of EST datasets in excess of 1 million sequences. An hierarchical approach was adopted where 1,198,607 million ESTs from GenBank release 110 (October 1998) were partitioned into "tissue bins" and each tissue bin was processed through a pipeline that included masking for contaminants, clustering, assembly, assembly analysis and consensus generation. A total of 478,707 consensus transcripts were generated for all the tissue categories and these sequences served as the input data for the generation of the wholebody index sequences. The clustering of all tissue-derived consensus transcripts was followed by the collapse of each consensus sequence to its individual ESTs prior to assembly and whole-body index consensus sequence generation. The hierarchical approach demonstrated a consolidation of the input EST data from 1,198607 ESTs to 69,158 multi-sequence clusters and 162,439 singletons (or individual ESTs). Chromosomal locations were added to 25,793 whole-body index sequences through assignment of genetic markers such as radiation hybrid markers and généthon markers. The whole-body index sequences were made available to the research community through a sequence-based search engine (http://ziggy.sanbi.ac.za/~alan/researchINDEX.html).Item Genome-wide annotation of chemosensory and glutamate-gated receptors, and related genes in Glossina morsitans morsitans tsetse fly(University of the Western Cape, 2014) Obiero, George Fredrick Opondo; Christoffels, Alan; Mireji, Paul O.; Masiga, DanielTsetse flies are the sole vectors of trypanosomes that cause nagana and sleeping sickness in animals and humans respectively in tropical Africa. Tsetse are unique: both sexes adults are exclusive blood-feeders, females are mated young and give birth to a single mature larva in sheltered habitats per pregnancy. Tsetse use chemoreception to detect and respond to chemical stimuli, helping them to locate hosts, mates, larviposition and resting sites. The detection is facilitated by chemoreceptors expressed on sensory neurons to cause specific responses. Specific molecular factors that mediate these responses are poorly understood in tsetse flies. This study aimed to identify and characterize genes that potentially mediate chemoreception in Glossina morsitans morsitans tsetse flies. These genes included sensory odorant (OR), gustatory (GR), ionotropic (IR), and related genes for odorant-binding (OBP), chemosensory (CSP) and sensory neuron membrane (SNMP) proteins. Synaptic transmission in higher brain sites may involve ionotropic glutamate-gated (iGluR) and metabotropic glutamate-gated (mGluR) receptors. The genes were annotated in G. m. morsitans genome scaffold assembly GMOY1.1 Yale strain using orthologs from D. melanogaster as query via TBLASTX algorithm at e-value below 1e-03. Positive blast hits were seeded as gene constructs in their respective scaffolds, and used as genomic reference onto which female fly-derived RNA sequence reads were mapped using CLC Genomics workbench suite. Seeded gene models were modified using RNA-Seq reads then viewed and re-edited using Artemis genome viewer tool. The genome was iteratively searched using the G. m. morsitans gene model sequences to recover additional similar hit sequences. The gene models were confirmed through comparisons against the NCBI conserved domains database (CDD) and non-redundant Swiss-Prot database. Trans-membrane domains and secretory peptides were predicted using TMHMM and SignalP tools respectively. Putative functions of the genes were confirmed via Blast2GO searches against gene ontology database. Evolutionary relationships amongst and between the genes were established using maximum likelihood estimates using best fitting amino acid model test in MEGA5 suite and PhyML tool. Expression profiles of genes were estimated using the RNA-seq data via CLCGenomics RNA-sequences analysis pipeline. Overall, 46 ORs, 14 GRs, and 19 IRs were identified, of which 21, 6 and 4 were manually identified for ORs, GRs, and IRs respectively. Additionally, 15 iGluRs, 6 mGluRs, 5 CSPs, 15 CD36-like, and 32 OBPs were identified. Six copies of OR genes (GmmOR41-46) were homologous to DmelOr67d, a single copy cis vacenyl acetate (cVA) receptor . Genes whose receptor homologs are associated with responses to CO2, GmmGR1-4, had higher expression profiles from amongst glossina GR genes. Known core-receptor homologs OR1, IR8a, IR25a and IR64a were conserved, and three species-specific divergent IRs (IR10a, IR56b and IR56d) were identified. Homologs of GluRIID, IR93a, and sweet taste receptors (Gr5a and Gr64a) were not identified in the genome. Homolog for LUSH protein, GmmOBP26, and sensory neuron membrane receptors SNMP1 and SNMP2 were conserved in the genome. Results indicate reduced repertoire of the chemosensory genes, and suggest reduced host range of the tsetse flies compared to other Diptera. Genes in multiple copies suggest their prioritization in chemoreception, which in turn may be tied to high specificity in host selection. Genes with high sequence conservation and expression profiles probably relate to their broad expression and utility within the fly nervous system. These results lay foundation for future comparative studies with other insects, provide opportunities for functional studies, and form the basis for re-examining new approaches for improving tsetse control tools and possible drug targets based on chemoreception.Item Genomic epidemiology of Rift Valley fever in East Africa(University of the Western Cape, 2023) Juma, John; Christoffels, AlanRift Valley fever (RVF) is a climate-driven zoonotic disease of significant importance to public and livestock health given its epidemic potential. The disease was first identified in 1930 in the great rift valley region of Kenya following massive abortion among pregnant ewes in a sheep farm. RVF is caused by the Rift Valley fever virus (RVFV) belonging to the genus Phlebovirus, family Phenuiviridae and order Bunyavirales. RVF primarily affects domestic ruminants, wild animals and humans with varying degrees of fatalities. In this PhD research, tools and methods that will be of immense importance in conducting genomic surveillance of RVFV were developed. Furthermore, we have applied phylodynamic and phylogeographic approaches to understand the transmission dynamics of the virus in East Africa. In the second chapter, we utilized publicly available genetic sequence data of the virus to build a tool that rapidly detects circulating lineages. A general observation made in this work is the paucity of RVFV genetic data globally. Only a handful of sequence data is available from countries that have previously experienced RVF outbreaks such as South Africa, Kenya and Madagascar. The genetic information in the sequence data is crucial in identifying lineage defining single nucleotide polymorphisms (SNPs) or mutations. Using these mutations, we developed a command line tool that rapidly characterizes RVFV isolates using the 15 alphabet (A to O) nomenclature.Item Identification and characterization of microRNAs and their putative target genes in Anopheles funestus s.s(2013) Ali, Mushal Allam Mohamed Alhaj; Christoffels, AlanThe discovery of microRNAs (miRNAs) is one of the most exciting scientific breakthroughs in the last decade. miRNAs are short RNA molecules that do not encode proteins but instead, regulate gene expression. Over the past several years, thousands of miRNAs have been identified in various insect genomes through cloning and sequencing, and even by computational prediction. However, information concerning possible roles of miRNAs in mosquitoes is limited. Within this context, we report here the first systematic analysis of these tiny RNAs and their target mRNAs in one of the principal African malaria vectors, Anopheles funestus s.s. Firstly, to extend the known repertoire of miRNAs expressed in this insect, the small RNAs from the four developmental stages (egg, larvae, pupae and the adult females), were sequenced using next generation sequencing technology. A total of 98 miRNAs were identified, which included 65 known Anopheles miRNAs, 25 miRNAs conserved in other insects and 8 novel miRNAs that had not been reported in any species. We further characterized new variants for miR-2 and miR-927 and stem-loop precursors for miR-286 and miR-2944. The analysis showed that many miRNAs have stage-specific expression, and co-transcribed and co-regulated during development. Secondly, for a better understanding of the molecular details of the miRNAs function, we identified the target genes for the Anopheles miRNAs using a novel approach that identifies overlap genes among three target prediction tools followed by filtering genes based on functional enrichment of GO terms and KEGG pathways. We found that most of the miRNAs are metabolic regulators. Moreover, the results suggest implication of some miRNAs not only in the development but also in insect-parasite interaction. Finally, we developed the InsecTar database (http://insectar.sanbi.ac.za) for miRNA targets in the three mosquito species; Anopheles gambiae, Aedes aegypti, and Culex quinquefasciatus, which incorporates prediction and the functional analysis of these target genes. The proposed database will undoubtedly assist to explore the roles of these regulatory molecules in insects. This type of analysis is a key step towards improving our understanding of the complexity and regulationmode of miRNAs in mosquitoes. Moreover, this study opens the door for exploration of miRNA in regulation of critical physiological functions specific to vector arthropods which may lead to novel approaches to combat mosquito-borne infectious diseases.Item Investigating genetic diversity and microRNA of Hermetia illucens (the black soldier fly) to breed for mass production of a novel sustainable protein(University of the Western Cape, 2023) De Raedt, Sarah Joanne; Christoffels, AlanIntroduction: A new sustainable source of protein is needed to meet the demands of the growing global population. Insect meal is a suitable replacement, and the black soldier fly (BSF) is the most used insect in industrial rearing. The black soldier fly larvae (BSFL) are not only high in sought-after nutrients (protein, fat, and chitin/source of fiber), but they also reduce organic waste that would go into landfills by consuming the waste and leaving behind a beneficial residue, which is used in fertilizers. However, little has been published on the genetics of BSF which are crucial to optimizing mass breeding programs necessary to meet the population demands. The aim of this study was to further the base of knowledge beneficial to mass rearing protocols by describing the genetic diversity of 3 populations, under differing scales of rearing, and the microRNA expression profile across 5 life stages, along with the first report of the novel microRNA of BSF.Item Next generation sequencing approaches for novel gene discovery in South African Parkinson’s disease families(University of the Western Cape, 2022) Pillay, Nikita Simone; Christoffels, AlanIn the last decade, next-generation sequencing (NGS) approaches have revolutionised the study of human genomics, particularly aiding the understanding of genetic diseases. Parkinson’s disease (PD) is a complex neurodegenerative disorder with a heterogenous genetic disposition. This disorder is clinically characterised by the progressive loss of dopaminergic neurons in the substantia nigra pars compacta (SNpc). Subsequently, this results in a severe decrease of available dopamine that manifests as a myriad of both motor and non-motor symptoms. Several genes, including α-synuclein (SNCA), parkin (PRKN), leucine-rich repeat kinase 2 (LRRK2), PTEN induced putative kinase 1 (PINK1), and protein deglycase (DJ-1), are confirmed as disease-causing in autosomal recessive (AR), autosomal dominant (AD), early-onset (EO), and late-onset (LO) forms of the disorder.Item Optimisation of proteomics techniques for archival tumour blocks of a South African cohort of colorectal cancer(University of Western Cape, 2020) Rossouw, Sophia Catherine; Christoffels, Alan; Rigby, JonathanTumour-specific protein markers are usually present at elevated concentrations in patient biopsy tissue; therefore tumour tissue is an ideal biological material for studying cancer proteomics and biomarker discovery studies. To understand and elucidate cancer pathogenesis and its mechanisms at the molecular level, the collection and characterisation of a large number of individual patient tissue cohorts are required. Since most pathology institutes routinely preserve biopsy tissues by standardised methods of formalin fixation and paraffin embedment, these archived, FFPE tissues are important collections of pathology material, often accompanied by important metadata, such as patient medical history and treatments. FFPE tissue blocks are conveniently stored under ambient conditions for decades, while retaining cellular morphology due to the modifications induced by formalin.Item Prediction of antimicrobial peptides using hyperparameter optimized support vector machines(University of the Western Cape, 2011) Gabere, Musa Nur; Vladimir, Bajic; Christoffels, Alan; South African National Bioinformatics Institute (SANBI); Faculty of ScienceAntimicrobial peptides (AMPs) play a key role in the innate immune response. They can be ubiquitously found in a wide range of eukaryotes including mammals, amphibians, insects, plants, and protozoa. In lower organisms, AMPs function merely as antibiotics by permeabilizing cell membranes and lysing invading microbes. Prediction of antimicrobial peptides is important because experimental methods used in characterizing AMPs are costly, time consuming and resource intensive and identification of AMPs in insects can serve as a template for the design of novel antibiotic. In order to fulfil this, firstly, data on antimicrobial peptides is extracted from UniProt, manually curated and stored into a centralized database called dragon antimicrobial peptide database (DAMPD). Secondly, based on the curated data, models to predict antimicrobial peptides are created using support vector machine with optimized hyperparameters. In particular, global optimization methods such as grid search, pattern search and derivative-free methods are utilised to optimize the SVM hyperparameters. These models are useful in characterizing unknown antimicrobial peptides. Finally, a webserver is created that will be used to predict antimicrobial peptides in haemotophagous insects such as Glossina morsitan and Anopheles gambiae.