Magister Scientiae - MSc (Bioinformatics)
Permanent URI for this collection
Browse
Browsing by Title
Now showing 1 - 20 of 44
Results Per Page
Sort Options
Item Analyses of sequence divergence using completely sequenced genomes(University of the Western Cape, 2003) Nembaware, Victoria P.; Seoighe, CathalUsing the complete genome, Saccharomyces cerevisiae, which duplicated after its speciation fuom Kluyveromyces lactics, a dataset of 119 putative S. cerevisiae - K. lactis ortholog-pairs was constructed. S. cerevisiae paralogous pairs that are likely to have duplicated during the whole genome duplication of S. cerevisiae were obtained and the approach taken in our previous work (Nembaware et al., 20OZ), was repeated to test whether the presence of a paralogue in S. cerevisiae had an effect on the rate of sequence divergence of the 119 pairs of orthologous genes. We found, however, that substitutions at synonymous sites had reached saturation and this prevented us from being able to repeat the previous finding with S. cerevistae and K. lactis . From this study a publicly available web-server (http://hamlyn.sanbi.ac.zal-victoria) that automates the calculation of Ka:Ks values given a pairs homologous CDS sequences is presented.Item Assessment of genome visualization tools relevant to HIV genome research: development of a genome browser prototype(University of the Western Cape, 2004) Boardman, Anelda Philine; Hide, Winston; Faculty of ScienceOver the past two decades of HIV research, effective vaccine candidates have been elusive. Traditionally viral research has been characterized by a gene -by-gene approach, but in the light of the availability of complete genome sequences and the tractable size of the HIV genome, a genomic approach may improve insight into the biology and epidemiology of this virus. A genomic approach to finding HIV vaccine candidates can be facilitated by the use of genome sequence visualization. Genome browsers have been used extensively by various groups to shed light on the biology and evolution of several organisms including human, mouse, rat, Drosophila and C.elegans. Application of a genome browser to HIV genomes and related annotations can yield insight into forces that drive evolution, identify highly conserved regions as well as regions that yields a strong immune response in patients, and track mutations that appear over the course of infection. Access to graphical representations of such information is bound to support the search for effective HIV vaccine candidates. This study aimed to answer the question of whether a tool or application exists that can be modified to be used as a platform for development of an HIV visualization application and to assess the viability of such an implementation. Existing applications can only be assessed for their suitability as a basis for development of an HIV genome browser once a well-defined set of assessment criteria has been compiled.Item Characterising the Prevalence and Mode of CXCR4 Usage in HIV-1 Group M Subtype C(University of the Western Cape, 2013) Crous, Saleema; Travers, Simon ADetermination of CXCR4-usage patterns is essential in establishing suitability of CCR5 antagonist prescription in HIV-1 infected individuals to prevent treatment failure. Previous studies have suggested a switch to CXCR4-usage to be far less common in subtype C, yet recent studies have reported between 30 - 50% CXCR4-usage in this subtype. However, CXCR4-usage in subtype C is poorly characterised. Furthermore, the reliability of available genotypic algorithms is unknown for subtype C sequences. In this study, a comparative analysis of the predictive ability of several subtype B-modeled genotyping algorithms in subtype C tropism determination was undertaken. A total of 731 HIV-1 subtype C V3 sequences with phenotypically determined coreceptor tropism were collated from several sources. Datasets of 349 CCR5, 25 CXCR4 exclusive and 31 R5X4 (Dual) sequences were submitted to 11 various tropism prediction tools. The best performing tool was used to determine the tropism of 12,121 subtype C V3 sequences with unknown phenotypes, in order to characterise the prevalence and method of CXCR4 usage in HIV-1 subtype C. We determined that geno2pheno with a false positive rate of 5% is the best approach for predicting CXCR4-usage in subtype C sequences with an accuracy of 94% (89% sensitivity and 99% specificity). Contrary to what has been reported for subtype B, the optimal approaches for prediction of CXCR4-usage in sequence from viruses that use CXCR4 exclusively, also perform best at predicting CXCR4-use in dual-tropic viral variants. Furthermore, we find that a switch to CXCR4 usage is seen in subtype C for well over 20 years and has occurred consistently over time. At 5%, the frequency of CXCR4-usage in subtype C database records is lower than previous reports for both subtype C and B. The Geno2pheno coreceptor tool may be used as a reliable genotypic predictor in clinical settings to establish the viability of CCR5-antagonist therapies using drugs such as Maraviroc and provides a rapid and cost effective alternative to phenotypic testing in resource limited areas. A switch to CXCR4-usage in subtype C is constant but lower when compared to subtype B, a finding which may have broad implications for the design of intervention and treatment strategies for HIV-1 subtype C.Item A comparative genomics approach towards classifying immunity-related proteins in the tsetse fly(2009) Mpondo, Feziwe; Hide, Winston; Christoffels, AlanTsetse flies (Glossina spp) are vectors of African trypanosome (Trypanosoma spp) parasites, causative agents of Human African trypanosomiasis (sleeping sickness) and Nagana in livestock. Research suggests that tsetse fly immunity factors are key determinants in the success and failure of infection and the maturation process of parasites. An analysis of tsetse fly immunity factors is limited by the paucity of genomic data for Glossina spp. Nevertheless, completely sequenced and assembled genomes of Drosophila melanogaster, Anopheles gambiae and Aedes aegypti provide an opportunity to characterize protein families in species such as Glossina by using a comparative genomics approach. In this study we characterize thioester-containing proteins (TEPs), a sub-family of immunity-related proteins, in Glossina by leveraging the EST data for G.morsitans and the genomic resources of D. melanogaster, A. gambiae as well as A.aegypti.A total of 17 TEPs corresponding to Drosophila (four TEPs), Anopheles (eleven TEPs) and Aedes aegypti (two TEPs) were collected from published data supplemented with Genbank searches. In the absence of genome data for G. morsitans, 124 000 G.morsitans ESTs were clustered and assembled into 18 413 transcripts (contigs and singletons). Five Glossina contigs (Gmcn1115, Gmcn1116, Gmcn2398, Gmcn2281 and Gmcn4297) were identified as putative TEPs by BLAST searches. Phylogenetic analyses were conducted to determine the relationship of collected TEP proteins.Gmcn1115 clustered with DmtepI and DmtepII while Gmcn2398 is placed in a separate branch, suggesting that it is specific to G. morsitans.The TEPs are highly conserved within D. melanogaster as reflected in the conservation of the thioester domain, while only two and one TEPs in A. gambiae and A. aegypti thioester domain show conservation of the thioester domain suggesting that these proteins are subjected to high levels of selection. Despite the absence of a sequenced genome for G. morsitans, at least two putative TEPs where identified from EST data.Item A comparative genomics approach towards classifying immunity-related proteins in the tsetse fly(University of the western cape, 2009) Mpondo, Feziwe; Hide, Winston; Christoffels, AlanTsetse flies (Glossina spp) are vectors of African trypanosome (Trypanosoma spp) parasites, causative agents of Human African trypanosomiasis (sleeping sickness) and Nagana in livestock. Research suggests that tsetse fly immunity factors are key determinants in the success and failure of infection and the maturation process of parasites. An analysis of tsetse fly immunity factors is limited by the paucity of genomic data for Glossina spp. Nevertheless, completely sequenced and assembled genomes Drosophila melanogaster, Anopheles gambiae and Aedes aegypti provide an opportunity to characterize protein families in species such as G/ossiza by using a comparative genomics approach. In this study, we characterize thioester-containing proteins (TEPs), a sub-family of immunity-related proteins, in Glossinaby leveraging the EST data for G. morsitans and the genomic resources of D. melanogaster, A. gambiae as well as A. aegyptiItem A computational characterisation of the relationship between genome structure and disease genes(University of the Western Cape, 2012) Kibler, Tracey Deborah; Tiffin, Nicki; Christoffels, AlanThis is a pilot study to investigate the relationship between disease gene status and the structure of the human genome with specific reference to regions of recombination. It compares certain characteristics of a control set of genes, with no reported association or function in any known disease, with a second set of well-curated genes with a known association to a disease. One of the benefits of recombination is the introduction of new combinations of genetic variation in the genome. Recombination hotspots are regions on the chromosome where higher than normal frequencies of breaking and rejoining between homologous chromosomes occur during meiosis. The hotspot regions exhibit both a non-random distribution across the human genome and varying frequencies of breaking and rejoining. The study analyzed a set of features that represent general properties of human genes; namely base composition (percentage GC content), genetic variation (single nucleotide polymorphisms - SNPs), gene length, and positional effect (distance from chromosome end), in both the disease-associated gene set and the control set. These features were linked to recombination hotspots in the human genome and the frequency of recombination at these hotspots. Descriptive statistics was used to determine differences between the occurrences of these features in disease-associated genes compared to the control set, as well as differences in the occurrence of these same features in subset of genes containing an internal recombination hotspot compared to the genes with no internal recombination hotspot. The study found that disease-associated genes are generally longer than those in the control set, which is consistent with previous studies. It also found that disease-associated genes are much more likely to contain a recombination hotspot than those genes with no disease association. The study did not, however, find any association between disease gene status and the other set of features; namely GC content, SNP numbers or the position of a gene on the chromosome. Further analysis of the data suggested that the increased probability of disease-associated genes containing a recombination hotspot is most likely an effect of longer gene length and that the presence of a recombination hotspot is not sufficient in its own right to cause disease gene status.Item Computational verification of published human mutations(University of the Western Cape, 2008) Kamanu, Frederick Kinyua; Lehväslaiho, Heikki; Bajic, Vladimir; Faculty of ScienceThe completion of the Human Genome Project, a remarkable feat by any measure, has provided over three billion bases of reference nucleotides for comparative studies. The next, and perhaps more challenging step is to analyse sequence variation and relate this information to important phenotypes. Most human sequence variations are characterized by structural complexity and, are hence, associated with abnormal functional dynamics. This thesis covers the assembly of a computational platform for verifying these variations, based on accurate, published, experimental data.Item Data Science techniques for predicting plant genes involved in secondary metabolites production(University of the Western Cape, 2018) Muteba, Ben Ilunga; Christoffels, AlanPlant genome analysis is currently experiencing a boost due to reduced costs associated with the development of next generation sequencing technologies. Knowledge on genetic background can be applied to guide targeted plant selection and breeding, and to facilitate natural product discovery and biological engineering. In medicinal plants, secondary metabolites are of particular interest because they often represent the main active ingredients associated with health-promoting qualities. Plant polyphenols are a highly diverse family of aromatic secondary metabolites that act as antimicrobial agents, UV protectants, and insect or herbivore repellents. Most of the genome mining tools developed to understand genetic materials have very seldom addressed secondary metabolite genes and biosynthesis pathways. Little significant research has been conducted to study key enzyme factors that can predict a class of secondary metabolite genes from polyketide synthases. The objectives of this study were twofold: Primarily, it aimed to identify the biological properties of secondary metabolite genes and the selection of a specific gene, naringenin-chalcone synthase or chalcone synthase (CHS). The study hypothesized that data science approaches in mining biological data, particularly secondary metabolite genes, would enable the compulsory disclosure of some aspects of secondary metabolite (SM). Secondarily, the aim was to propose a proof of concept for classifying or predicting plant genes involved in polyphenol biosynthesis from data science techniques and convey these techniques in computational analysis through machine learning algorithms and mathematical and statistical approaches. Three specific challenges experienced while analysing secondary metabolite datasets were: 1) class imbalance, which refers to lack of proportionality among protein sequence classes; 2) high dimensionality, which alludes to a phenomenon feature space that arises when analysing bioinformatics datasets; and 3) the difference in protein sequences lengths, which alludes to a phenomenon that protein sequences have different lengths. Considering these inherent issues, developing precise classification models and statistical models proves a challenge. Therefore, the prerequisite for effective SM plant gene mining is dedicated data science techniques that can collect, prepare and analyse SM genes.Item A deep learning approach to predicting potential virus species crossover using convolutional neural networks and viral protein sequence patterns(University of the Western Cape, 2022) Serage, Rudolph; Anderson, DominiqueMedical science has made substantial progress toward diagnosing, understanding the pathogenesis, and treating various causative agents of infectious disease; however, novel microbial pathogens continue to emerge, and existing pathogens continue to evolve alternative means to thrive in ever-changing environments. Various infectious disease etiological agents originate from animal reservoirs, and many have, over time, acquired the ability to cross the species barrier and alter their host range. The emergence and re-emergence of zoonotic pathogens is reported to be a consequence of changes in several factors, including ecological, behavioural, and socioeconomic variables which are arguably impossible to control. Computational methods with the capacity to evaluate large datasets, are considered invaluable tools for predicting and tracking disease outbreaks and are especially powerful when combined with machine learning techniques.Item Development and implementation of ontology-based systems for mammalian gene expression profiling(University of the Western Cape, 2009) Kruger, Adele; Hide, WinstonThe use of ontologies in the mapping of gene expression events provides an effective and comparable method to determine the expression profile of an entire genome across a large collection of experiments derived from different expression sources. In this dissertation I describe the development of the developmental human and mouse e voe ontologies and demonstrate the ontologies by identifying genes showing a bias for developmental brain expression in human and mouse, identifying transcription factor complexes, and exploring the mouse orthologs of human cancer/testis genes.Item Development of a simple artificial intelligence method to accurately subtype breast cancers based on gene expression barcodes(University of the Western Cape, 2018) Esterhuysen, Fanechka Naomi; Gamieldien, JunaidINTRODUCTION: Breast cancer is a highly heterogeneous disease. The complexity of achieving an accurate diagnosis and an effective treatment regimen lies within this heterogeneity. Subtypes of the disease are not simply molecular, i.e. hormone receptor over-expression or absence, but the tumour itself is heterogeneous in terms of tissue of origin, metastases, and histopathological variability. Accurate tumour classification vastly improves treatment decisions, patient outcomes and 5-year survival rates. Gene expression studies aided by transcriptomic technologies such as microarrays and next-generation sequencing (e.g. RNA-Sequencing) have aided oncology researcher and clinician understanding of the complex molecular portraits of malignant breast tumours. Mechanisms governing cancers, which include tumorigenesis, gene fusions, gene over-expression and suppression, cellular process and pathway involvementinvolvement, have been elucidated through comprehensive analyses of the cancer transcriptome. Over the past 20 years, gene expression signatures, discovered with both microarray and RNA-Seq have reached clinical and commercial application through the development of tests such as Mammaprint®, OncotypeDX®, and FoundationOne® CDx, all which focus on chemotherapy sensitivity, prediction of cancer recurrence, and tumour mutational level. The Gene Expression Barcode (GExB) algorithm was developed to allow for easy interpretation and integration of microarray data through data normalization with frozen RMA (fRMA) preprocessing and conversion of relative gene expression to a sequence of 1's and 0's. Unfortunately, the algorithm has not yet been developed for RNA-Seq data. However, implementation of the GExB with feature-selection would contribute to a machine-learning based robust breast cancer and subtype classifier. METHODOLOGY: For microarray data, we applied the GExB algorithm to generate barcodes for normal breast and breast tumour samples. A two-class classifier for malignancy was developed through feature-selection on barcoded samples by selecting for genes with 85% stable absence or presence within a tissue type, and differentially stable between tissues. A multi-class feature-selection method was employed to identify genes with variable expression in one subtype, but 80% stable absence or presence in all other subtypes, i.e. 80% in n-1 subtypes. For RNA-Seq data, a barcoding method needed to be developed which could mimic the GExB algorithm for microarray data. A z-score-to-barcode method was implemented and differential gene expression analysis with selection of the top 100 genes as informative features for classification purposes. The accuracy and discriminatory capability of both microarray-based gene signatures and the RNA-Seq-based gene signatures was assessed through unsupervised and supervised machine-learning algorithms, i.e., K-means and Hierarchical clustering, as well as binary and multi-class Support Vector Machine (SVM) implementations. RESULTS: The GExB-FS method for microarray data yielded an 85-probe and 346-probe informative set for two-class and multi-class classifiers, respectively. The two-class classifier predicted samples as either normal or malignant with 100% accuracy and the multi-class classifier predicted molecular subtype with 96.5% accuracy with SVM. Combining RNA-Seq DE analysis for feature-selection with the z-score-to-barcode method, resulted in a two-class classifier for malignancy, and a multi-class classifier for normal-from-healthy, normal-adjacent-tumour (from cancer patients), and breast tumour samples with 100% accuracy. Most notably, a normal-adjacent-tumour gene expression signature emerged, which differentiated it from normal breast tissues in healthy individuals. CONCLUSION: A potentially novel method for microarray and RNA-Seq data transformation, feature selection and classifier development was established. The universal application of the microarray signatures and validity of the z-score-to-barcode method was proven with 95% accurate classification of RNA-Seq barcoded samples with a microarray discovered gene expression signature. The results from this comprehensive study into the discovery of robust gene expression signatures holds immense potential for further R&F towards implementation at the clinical endpoint, and translation to simpler and cost-effective laboratory methods such as qtPCR-based tests.Item The development of a single nucleotide polymorphism database for forensic identification of specified physical traits(University of the Western Cape, 2009) Naidu, Alecia Geraldine; Bajic, Vladimir; NULL; Faculty of ScienceMany Single Nucleotide Polymorphisms (SNPs) found in coding or regulatory regions within the human genome lead to phenotypic differences that make prediction of physical appearance, based on genetic analysis, potentially useful in forensic investigations. Complex traits such as pigmentation can be predicted from the genome sequence, provided that genes with strong effects on the trait exist and are known. Phenotypic traits may also be associated with variations in gene expression due to the presence of SNPs in promoter regions. In this project, the identification of genes associated with these physical traits of potential forensic relevance have been collated from the literature using a text mining platform and hand curation. The SNPs associated with these genes have been acquired from public SNP repositories such as the International HapMap project, dbSNP and Ensembl. Characterization of different population groups based on the SNPs has been performed and the results and data stored in a MySQL database. This database contains SNP genotyping data with respect to physical phenotypic differences of forensic interest. The potential forensicrelevance of the SNP information contained in this database has been verified through in silico SNP analysis aimed at establishing possible relationships between SNP occurrence and phenotype. The software used for this analysis is MATCH™. Data management and access has been enhanced by the use of a functional web-based front-end which enables the users to extract and display SNP information without running complex Structured Query Language (SQL) statements from the command line. This Forensic SNP Phenotype resource can be accessed at http://forensic.sanbi.ac.za/alecia_forensics/Index.htmlItem Development of Open source Laboratory Information Management System (LIMS) For Human Biobanking(University of the Western Cape, 2018) Ademuyiwa, Toluwaleke; Christoffels, AlanBiobanks are collections of biological samples and associated data for future use. The day to day activities in a biobank laboratory is underpinned by a laboratory information management system (LIMS). For example, the LIMS manages the execution of tests on biospecimens and track their movement and processing through the laboratory. There are a range of commercially available Biobank LIMS systems on the market but their costs are prohibitive in a resource limited setting. The cost of Commercial off-the-shelf software includes the initial cost of acquiring the system, as well as the cost of maintenance and support throughout the software's life cycle. The Bika LIMS system on the other hand is Free and open source software (FOSS) with decreased license cost, used routinely in non-medical laboratories. Ideally, if Bika LIMS could be customised to handle human biospecimens, then both biobanks and genetics laboratories could benefit. Central to any biobank functionality in Bika LIMS is the ability to import information from routine biomedical equipment. We identified two instruments that are key to human biobanking and are lacking in Bika LIMS namely BioDrop ?LITE and the Qubit Fluorometric instrument. Import interfaces for importing DNA/RNA concentration analyses from these instruments and management of the results with associated sample information would add value to the LIMS. The aim of the thesis was to customise Bika LIMS for utility in a biomedical laboratory. In collaboration with colleagues at Tygerberg medical school, the Bika LIMS software was customised to accommodate the DNA and RNA concentration analyses results for a pathology laboratory and the LIMS workflows customised for use at Tygerberg medical school. In this process the manual operations of Tygerberg medical school laboratory would migrate to the use of Bika LIMS. The analytical module in Bika LIMS was implemented using PYTHON, by using logic that allows importing of specific analyses. A template was created for the BioDrop ?LITE and Qubit Fluorometric instruments used for developing the interface for an analysis import form. The instruments generate results in CSV file format. A parser was created to read and parse the files uploaded from the import form, by splitting them into parts, extracting the data, and populating key-value pairs. The controller manages the submission of the form by initialising the parser that imports the specific file into the LIMS where it is managed by the configured Bika LIMS workflow.Item Effects of nucleotide variation on the structure and function of human arylamine n-acetyltransferase 1(2012) Akurugu, Wisdom Alemya; Christoffels, AlanThe human arylamine N-acetyltransferase 1 (NAT1) is critical in determining the duration of action and pharmacokinetics of amine-containing drugs such as para-aminosalicylic acid and para-aminobenzoyl glutamate used in clinical therapy of tuberculosis (TB), as well as influencing the balance between detoxification and metabolic activation of these drugs. SNPs in this enzyme are continuously being detected and indicate inter-ethnic and inter-individual variation in the enzyme function. The effect of nsSNPs on the structure and function of proteins are routinely analyzed using SIFT and POLYPHEN-2 prediction algorithms. The false-negative rate of these two algorithms results in as much as 25% of nsSNPs. This study aimed to explore the use of homology modeling including residue interactions, Gibbs free energy change and solvent accessibility as additional evidence for predicting nsSNP effects on enzyme function.This study evaluated the functional effects of 14 nsSNPs identified in a South African mixed ancestry population of which 3 nsSNPs were previously identified in Caucasians. The SNPs were evaluated using structural analysis that included homology modeling, residue interactions, relative solvent accessibility,Gibbs free energy change and sequence conservation in addition to the routinely used nsSNP function prediction algorithms, SIFT and POLYPHEN-2. The structural analysis implemented in this study showed a loss of hydrogen bonds for S259R thereby affecting protein function which contradicts predictions obtained from SIFT and POLYPHEN-2 algorithms. The variant N245I was shown to be neutral but contradicted the predictions from SIFT and POLYPHEN-2. Structural analysis predicted that variant R242M would affect protein stability and therefore NAT1 function in agreement with POLYPHEN-2 predictions but contradicting predictions from SIFT. No structural changes were expected for variant E264K in agreement with predictions obtained from POLYPHEN-2 but contradicting results from SIFT. The functions of the remaining 10 nsSNPs were consistent with those predicted by SIFT and POLYPHEN-2 namely that four variants R117T, E167Q, T193S and T240S do not affect the NAT1 function whereas R166T, F202V, Q210P, D229H, V231G and V235A could affect the enzyme function.This study provided the first evaluation of the functional effects of 11 newly characterized nsSNPs on the NAT1 tuberculosis drug-metabolizing enzyme. The six functionally important nsSNPs predicted by all three methods and the four SNPs with contradictory results will be tested experimentally by creating a SNP construct that will be cloned into an expression vector. These combined computational and experimental studies will advance our understanding of NAT1 structure-function relationships and allow us to interpret the NAT1 genetic polymorphisms in individuals who are slow or fast acetylators. The results, albeit a small dataset demonstrate that the routinely used algorithms are not without flaws and that improvements in functional prediction of nsSNPs can be obtained by close scrutiny of the molecular interactions of wild type and variant amino acids.Item Enabling the processing of bioinformatics workflows where data is located through the use of cloud and container technologies(University of the Western Cape, 2019) de Beste, Eugene; Christoffels, AlanThe growing size of raw data and the lack of internet communication technology to keep up with that growth is introducing unique challenges to academic researchers. This is especially true for those residing in rural areas or countries with sub-par telecommunication infrastructure. In this project I investigate the usefulness of cloud computing technology, data analysis workflow languages and portable computation for institutions that generate data. I introduce the concept of a software solution that could be used to simplify the way that researchers execute their analysis on data sets at remote sources, rather than having to move the data. The scope of this project involved conceptualising and designing a software system to simplify the use of a cloud environment as well as implementing a working prototype of said software for the OpenStack cloud computing platform. I conclude that it is possible to improve the performance of research pipelines by removing the need for researchers to have operating system or cloud computing knowledge and that utilising technologies such as this can ease the burden of moving data.Item Establishing a framework for an African Genome Archive(University of Western Cape, 2019) Southgate, Jamie; Christoffels, AlanThe generation of biomedical research data on the African continent is growing, with numerous studies realizing the importance of African genetic diversity in discoveries of human origins and disease susceptibility. The decrease in costs to purchase and utilize such tools has enabled research groups to produce datasets of significant scientific value. However, this success story has resulted in a new challenge for African Researchers and institutions. An increase in data scale and complexity has led to an imbalance of infrastructure and skills to manage, store and analyse this dataItem An evaluation of galaxy and ruffus-scripting workflows system for DNA-seq analysis(University of the Western Cape, 2018) Oluwaseun, Ajayi Olabode; Christoffels, AlanFunctional genomics determines the biological functions of genes on a global scale by using large volumes of data obtained through techniques including next-generation sequencing (NGS). The application of NGS in biomedical research is gaining in momentum, and with its adoption becoming more widespread, there is an increasing need for access to customizable computational workflows that can simplify, and offer access to, computer intensive analyses of genomic data. In this study, the Galaxy and Ruffus frameworks were designed and implemented with a view to address the challenges faced in biomedical research. Galaxy, a graphical web-based framework, allows researchers to build a graphical NGS data analysis pipeline for accessible, reproducible, and collaborative data-sharing. Ruffus, a UNIX command-line framework used by bioinformaticians as Python library to write scripts in object-oriented style, allows for building a workflow in terms of task dependencies and execution logic. In this study, a dual data analysis technique was explored which focuses on a comparative evaluation of Galaxy and Ruffus frameworks that are used in composing analysis pipelines. To this end, we developed an analysis pipeline in Galaxy, and Ruffus, for the analysis of Mycobacterium tuberculosis sequence data. Furthermore, this study aimed to compare the Galaxy framework to Ruffus with preliminary analysis revealing that the analysis pipeline in Galaxy displayed a higher percentage of load and store instructions. In comparison, pipelines in Ruffus tended to be CPU bound and memory intensive. The CPU usage, memory utilization, and runtime execution are graphically represented in this study. Our evaluation suggests that workflow frameworks have distinctly different features from ease of use, flexibility, and portability, to architectural designs.Item An evolutionary genomics approach towards analysis of genes implicated in transmission of trypanosomes between tsetse fly and mammalian host(2009) Mwangi, Sarah Wambui; Christoffels, AlanHuman African trypanosomiasis is the world’s third most important parasitic disease affecting human health after malaria and schistosomiaisis. The world health organization estimates approximately 60 million people at risk in sub-Saharan Africa and up to 50,000 deaths per year caused by trypanosomiasis. Current management of human African trypanosomiasis relies on active surveillance and chemotherapy of infected patients. Efforts to develop a vaccine to immunize the human host have been hampered by antigenic variation of the parasites cell coat. The advent of the genome era has opened up opportunities for developing novel strategies for interrupting the transmission cycle of trypanosomes, specifically using any of the three players,the human host, the tsetse fly vector and/or the parasite. The human genome has been deciphered and the genomes of several trypanosome species have been sequenced. Sequencing of additional neglected trypanosome species is in progress. The tsetse fly genome is currently being sequenced as part of the genomic activities of the International Glossina genome initiative (IGGI). In an attempt to support the tsetse fly sequencing effort, expressed sequence tags (ESTs) from various tissues and developmental stages of Glossina morsitans have been generated.In this study, tsetse fly EST data was analyzed using bioinformatics approaches, focusing on transcripts encoding serpin genes implicated in the immune defenses of tsetse flies. Glossina morsitans homologues to Drosophila melanogaster serpin4, serpin5, and serpin27A and Anopheles gambiae serpin10 were identified in the tsetse fly EST contigs. Comparison of the reactive center loop of tsetse fly serpins with human α-1-antitrypsin suggests that these tsetse serpins are inhibitory. Preliminary EST clustering did not succeed in assembling 3564 Tsal encoded ESTs into one contig. In this study, these ESTs were assembled together with three published Tsal cDNAs. A total of 29 Tsal-encoded contigs were generated. An analysis of the sequence variation within the Tsal EST assembled contigs identified five single base mismatches namely A-T, T-A, G-T and T-G.Results from this study form a basis onto which genetic and biochemical experimental studies can be designed, a process that will be successfully carried out once we have a reference genome. Specifically, studies aimed at genetic modification of tsetse flies towards populations that are inhabitable to trypanosomes. Ultimately, this will supplement current vector control strategies towards elimination of human African trypanosomiasis.Item Exploring the influence of organisational, environmental, and technological factors on information security policies and compliance at South African higher education institutions: Implications for biomedical research.(University of Western Cape, 2020) Abiodun, Oluwafemi Peter; Christoffels, Alan; Anderson, DominiqueHeadline reports on data breaches worldwide have resulted in heightened concerns about information security vulnerability. In Africa, South Africa is ranked among the top ‘at-risk’ countries with information security vulnerabilities and is the most cybercrime-targeted country. Globally, such cyber vulnerability incidents greatly affect the education sector, due, in part, to the fact that it holds more Personal Identifiable Information (PII) than other sectors. PII refers to (but is not limited to) ID numbers, financial account numbers, and biomedical research data.Item Exploring the influence of organisational, environmental, and technological factors on information security policies and compliance at South African higher education institutions: Implications for biomedical research.(University of the Western Cape, 2020) Abiodun, Oluwafemi Peter; Christoffels, AlanHeadline reports on data breaches worldwide have resulted in heightened concerns about information security vulnerability. In Africa, South Africa is ranked among the top ‘at-risk’ countries with information security vulnerabilities and is the most the most cybercrime-targeted country. Globally, such cyber vulnerability incidents greatly affect the education sector, due, in part, to the fact that it holds more Personal Identifiable Information (PII) than other sectors. PII refers to (but is not limited to) ID numbers, financial account numbers, and biomedical research data. In response to rising threats, South Africa has implemented a regulation called the Protection of Personal Information Act (POPIA), similar to the European Union General Data Protection Regulation (GDPR), which seeks to mitigate cybercrime and information security vulnerabilities. The extent to which African institutions, especially in South Africa, have embraced and responded to these two information security regulations remains vague, making it a crucial matter for biomedical researchers. This study aimed to assess whether the participating universities have proper and reliable information security practices, measures and management in place and whether they fall in line with both national (POPIA) and international (GDPR) regulations. In order to achieve this aim, the study undertook a qualitative exploratory analysis of information security management across three universities in South Africa. A Technology, Organizational, and Environmental (TOE) model was employed to investigate factors that may influence effective information security measures. A Purposeful sampling method was employed to interview participants from each university. From the technological standpoint, Bring Your Own Device (BYOD) policy, whereby on average, a student owns and connects between three to four internet-enabled devices to the network, has created difficulties for IT teams, particularly in the areas of authentication, explosive growth in bandwidth, and access control to security university servers. In order to develop robust solutions to mitigate these concerns, and which are not perceived by users as overly prohibitive, executive management should acknowledge that security and privacy issues are a universal problem and not solely an IT problem and equip the IT teams with the necessary tools and mechanisms to allow them to overcome commonplace challenges. At an organisational level, information security awareness training of all users within the university setting was identified as a key factor in protecting the integrity, confidentiality, and availability of information in highly networked environments. Furthermore, the University’s information security mission must not simply be a link on a website, it should be constantly re-enforced by informing users during, and after, the awareness training. In terms of environmental factors, specifically the GDPR and POPIA legislations, one of the most practical and cost-effective ways universities can achieve data compliance requirements is to help staff (both teaching and non-teaching), students, and other employees understand the business value of all information. Users which are more aware of sensitivity of data, risks to the data, and their responsibilities when handling, storing, processing, and distributing data during their day to day activities will behave in a manner that would makes compliance easier at the institutional level. Results obtained in this study helped to elucidate the current status, issues, and challenges which universities are facing in the area of information security management and compliance, particularly in the South African context. Findings from this study point to organizational factors being the most critical when compared to the technological and environmental contexts examined. Furthermore, several proposed information security policies were developed with a view to assist biomedical practitioners within the institutional setting in protecting sensitive biomedical data.
- «
- 1 (current)
- 2
- 3
- »