Browsing by Author "Gamieldien, Junaid"
Now showing 1 - 20 of 27
Results Per Page
Sort Options
Item A 35-gene signature discriminates between rapidly- and slowly-progressing glioblastoma multiforme and predicts survival in known subtypes of the cancer(BioMed Central, 2018) Fatai, Azeez A.; Gamieldien, JunaidBACKGROUND: Gene expression can be employed for the discovery of prognostic gene or multigene signatures cancer. In this study, we assessed the prognostic value of a 35-gene expression signature selected by pathway and machine learning based methods in adjuvant therapy-linked glioblastoma multiforme (GBM) patients from the Cancer Genome Atlas. METHODS: Genes with high expression variance was subjected to pathway enrichment analysis and those having roles in chemoradioresistance pathways were used in expression-based feature selection. A modified Support Vector Machine Recursive Feature Elimination algorithm was employed to select a subset of these genes that discriminated between rapidly-progressing and slowly-progressing patients. RESULTS: Survival analysis on TCGA samples not used in feature selection and samples from four GBM subclasses, as well as from an entirely independent study, showed that the 35-gene signature discriminated between the survival groups in all cases (p < 0.05) and could accurately predict survival irrespective of the subtype. In a multivariate analysis, the signature predicted progression-free and overall survival independently of other factors considered. CONCLUSION: We propose that the performance of the signature makes it an attractive candidate for further studies to assess its utility as a clinical prognostic and predictive biomarker in GBM patients. Additionally, the signature genes may also be useful therapeutic targets to improve both progression-free and overall survival in GBM patients.Item Aberrations in the Retinoblastoma susceptibility gene in tumours from South Africa oesophageal cancer patients(University of the Western Cape, 1996) Gamieldien, Junaid; Hendricks, D.T; Smith, A; Parker, M.ILittle is known about the genetic events occurring in oesophageal cancer and very few studies have been undertaken to analyse oesophageal tumours from South African patients in this regard. Inactivation of numerous tumour suppressor genes, including the Rb gene, has been implicated in oesophageal tumourigenesis in different populations. This study had two objectives. The first was to develop a procedure for the simultaneous extraction of DNA and RNA from small (ca. 25mg) oesophageal biopsy samples. The procedure developed here has proven to be rapid, cost effective and consistently produced excellent yields of high-quality DNA and RNA. It has to be determined, however, whether long-term storage affects the integrity of the isolated RNA. The second and primary objective of this study was to determine whether the Rb gene is involved in oesophageal tumourigenesis in South African patients. Loss of Heterozygosity analysis using a VNTR marker in intron 20 and a microsatellite marker in intron 4 of the Rb gene revealed that Rb-allelic loss had occurred n 50% of the thirty-three patients analysed. Furthermore, microsatellite instability was demonstrated at the intron 4 marker in 15% of the patients analysed. Mutation screening of exons 17 and 21 of the Rb gene, frequently mutated in oesophageal tumours from Chinese patients, in twenty samples using the mutation screening techniques of SSCP and heteroduplex analysis, followed by DNA sequencing of putative positives, revealed no positive mutations. However, the high percentage of allelic loss found suggests that the /lb gene is inactivated in the progression of South African oesophageal tumours. Furthermore, the microsatellite instability suggests that defective DNA repair may also play a role in oesophageal tumourigenesis.Item The African Coelecanth genome provides insights into tetrapod evolution(Macmillan Publishers, 2013) Christoffels, Alan; Hesse, Uljana; Gamieldien, Junaid; Panji, Sumir; Picone, Barbara; Van Heusden, PeterThe discovery of a living coelacanth specimen in 1938 was remarkable, as this lineage of lobe-finned fish was thought to have become extinct 70 million years ago. The modern coelacanth looks remarkably similar to many of its ancient relatives, and its evolutionary proximity to our own fish ancestors provides a glimpse of the fish that first walked on land. Here we report the genome sequence of the African coelacanth, Latimeria chalumnae. Through a phylogenomic analysis, we conclude that the lungfish, and not the coelacanth, is the closest living relative of tetrapods. Coelacanth protein-coding genes are significantly more slowly evolving than those of tetrapods, unlike other genomic features. Analyses of changes in genes and regulatory elements during the vertebrate adaptation to land highlight genes involved in immunity, nitrogen excretion and the development of fins, tail, ear, eye, brain and olfaction. Functional assays of enhancers involved in the fin-to-limb transition and in the emergence of extra-embryonic tissues show the importance of the coelacanth genome as a blueprint for understanding tetrapod evolution.Item Application of anin silicoapproach identifies a genetic locus withinITGB2,and itsinteractions withHSPG2 and FGF9,to be associated with anterior cruciateligament rupture risk(Taylor and Francis Group, 2023) Dlamini, Senanile B.; Saunders, Colleen J.; Gamieldien, JunaidWe developed a Biomedical Knowledge Graph model that is phenotype and biological function-aware through integrating knowledge from multiple domains in a Neo4j, graph database. Allknown human genes were assessed through the model to identify potential new risk genes foranterior cruciate ligament (ACL) ruptures and Achilles tendinopathy (AT). Genes were prioritisedand explored in a case–control study comparing participants with ACL ruptures (ACL-R),including a sub-group with non-contact mechanism injuries (ACL-NON), to uninjured controlindividuals (CON). After genefiltering, 3376 genes, including 411 genes identified throughprevious whole exome sequencing, were found to be potentially linked to AT and ACL ruptures.Four variants were prioritised:HSPG2:rs2291826A/G,HSPG2:rs2291827G/A,ITGB2:rs2230528C/TandFGF9:rs2274296C/T. The rs2230528 CC genotype was over-represented in the CON groupcompared to ACL-R (p< 0.001) and ACL-NON (p< 0.001) and the TT genotype and T allele wereover-represented in the ACL-R group and ACL-NON compared to CON (p< 0.001) group. Severalsignificant differences in distributions were noted for the gene-gene interactions: (HSPG2:rs2291826, rs2291827 andITGB2:rs2230528) and (ITGB2:rs2230528 andFGF9:rs2297429).Item Computational analysis of multilevel omics data for the elucidation of molecular mechanisms of cancer(University of the Western Cape, 2015) Fatai, Azeez Ayomide; Gamieldien, JunaidCancer is a group of diseases that arises from irreversible genomic and epigenomic alterations that result in unrestrained proliferation of abnormal cells. Detailed understanding of the molecular mechanisms underlying a cancer would aid the identification of most, if not all, genes responsible for its progression and the development of molecularly targeted chemotherapy. The challenge of recurrence after treatment shows that our understanding of cancer mechanisms is still poor. As a contribution to overcoming this challenge, we provide an integrative multi-omic analysis on glioblastoma multiforme (GBM) for which large data sets on di erent classes of genomic and epigenomic alterations have been made available in the Cancer Genome Atlas data portal. The rst part of this study involves protein network analysis for the elucidation of GBM tumourigenic molecular mechanisms, identification of driver genes, prioritization of genes in chromosomal regions with copy number alteration, and co-expression and transcriptional analysis. Functional modules were obtained by edge-betweenness clustering of a protein network constructed from genes with predicted functional impact mutations and differentially expressed genes. Pathway enrichment analysis was performed on each module to identify statistical overrepresentation of signaling pathways. Known and novel candidate cancer driver genes were identi ed in the modules, and functionally relevant genes in chromosomal regions altered by homologous deletion or high-level amplication were prioritized with the protein network. Co-expressed modules enriched in cancer biological processes and transcription factor targets were identified using network genes that demonstrated high expression variance. Our findings show that GBM's molecular mechanisms are much more complex than those reported in previous studies. We next identified differentially expressed miRNAs for which target genes associated with the protein network were also differentially expressed. MiRNAs and target genes were prioritized based on the number of targeted genes and targeting miRNAs, respectively. MiRNAs that correlated with time to progression were selected by an elastic net-penalized Cox regression model for survival analysis. These miRNA were combined into a signature that independently predicted adjuvant therapy-linked progression-free survival in GBM and its subtypes and overall survival in GBM. The results show that miRNAs play significant roles in GBM progression and patients' survival finally, a prognostic mRNA signature that independently predicted progression-free and overall survival was identified. Pathway enrichment analysis was carried on genes with high expression variance across a cohort to identify those in chemoradioresistance associated pathways. A support vector machine-based method was then used to identify a set of genes that discriminated between rapidly- and slowly-progressing GBM patients, with minimal 5 % cross-validation error rate. The prognostic value of the gene set was demonstrated by its ability to predict adjuvant therapy-linked progression-free and overall survival in GBM and its subtypes and was validated in an independent data set. We have identified a set of genes involved in tumourigenic mechanisms that could potentially be exploited as targets in drug development for the treatment of primary and recurrent GBM. Furthermore, given their demonstrated accuracy in this study, the identified miRNA and mRNA signatures have strong potential to be combined and developed into a robust clinical test for predicting prognosis and treatment response.Item Computational characterization of iron metabolism in the tsetse disease vector, glossina morsitans: Ire stem-loops(BMC, 2016) Dashti, Zahra Jalali Sefid; Gamieldien, Junaid; Christoffels, AlanIron metabolism and regulation is an indispensable part of species survival, most importantly for blood feeding insects. Iron regulatory proteins are central regulators of iron homeostasis, whose binding to iron response element (IRE) stem-loop structures within the UTRs of genes regulate expression at the post-transcriptional level. Despite the extensive literature on themechanism of iron regulation in human, less attention has been given to insect and more specifically the blood feeding insects, where research has mainly focused on the characterization of ferritin and transferrin. We thus, examined the mechanism of iron homeostasis through a genome-wide computational identification of IREs and other enriched motifs in the UTRs of Glossina morsitans with the view to identify new IRE-regulated genes.Item Computational genomics approaches for kidney diseases in Africa(University of the Western Cape, 2015) Mapiye, Darlington Shingirirai; Tiffin, Nicki; Gamieldien, JunaidEnd stage renal disease (ESRD), a more severe form of kidney disease, is considered to be a complex trait that may involve multiple processes which work together on a background of a significant genetic susceptibility. Black Africans have been shown to bear an unequal burden of this disease compared to white Europeans, Americans and Caucasians. Despite this, most of the genetic and epidemiological advances made in understanding the aetiology of kidney diseases have been done in other populations outside of sub-Saharan Africa (SSA). Very little research has been undertaken to investigate key genetic factors that drive ESRD in Africans compared to patients from rest of world populations. Therefore, the primary aim of this Bioinformatics thesis was twofold: firstly, to develop and apply a whole exome sequencing (WES) analysis pipeline and use it to understand a genetic mechanism underlying ESRD in a South African population of mixed ancestry. As I hypothesized that the pipeline would enable the discovery of highly penetrate rare variants with large effect size, which are expected to explain an important fraction of the genetic aetiology and pathogenesis of ESRD in these African patients. Secondly, the aim was to develop and set up a multicenter clinical database that would capture a plethora of clinical data for patients with Lupus, one of the risk factors of ESRD. From WES of six family members (five cases and one control); a total of 23 196 SNVs, 1445 insertions and 1340 deletions, overlapped amongst all affected family members. The variants were consistent with an autosomal dominant inheritance pattern inferred in this family. Of these, only 1550 SNVs, 67 insertions and 112 deletions were present in all affected family members but absent in the unaffected family member. Following detailed evaluation of evidence for variant implication and pathogenicity, only 3 very rare heterozygous missense variants in 3 genes COL4A1 [p.R476W], ICAM1 [p.P352L], COL16A1 [p.T116M] were considered potentially disease causing. Computational relatedness analysis revealed approximate amount of DNA shared by family members and confirmed reported relatedness. Genotyping for the Y chromosome was additionally performed to assist in sample identity. The clinical database has been designed and is being piloted at Groote Schuur medical Hospital at the University of Cape Town. Currently, about 290 patients have already been entered in the registry. The resources and methodologies developed in this thesis have the potential to contribute not only to the understanding of ESRD and its risk factors, but to the successful application of WES in clinical practice. Importantly, it contributes significant information on the genetics of ESRD based on an African family and will also improve scientific infrastructure on the African continent. Clinical databasing will go a long way to enable clinicians to collect and store standardised clinical data for their patients.Item Defining the molecular signatures of Achilles tendinopathy and anterior cruciate ligament ruptures: A whole-exome sequencing approach(Public Library of Science, 2018) Gibbon, Andrea; Saunders, Colleen J.; Collins, Malcolm; Gamieldien, Junaid; September, Alison V.Musculoskeletal soft tissue injuries are complex phenotypes with genetics being one of many proposed risk factors. Case-control association studies using the candidate gene approach have predominately been used to identify risk loci for these injuries. However, the ability to identify all risk conferring variants using this approach alone is unlikely. Therefore, this study aimed to further define the genetic profile of these injuries using an integrated omics approach involving whole exome sequencing and a customised analyses pipeline. The exomes of ten exemplar asymptomatic controls and ten exemplar cases with Achilles tendinopathy were individually sequenced using a platform that included the coverage of the untranslated regions and miRBase miRNA genes. Approximately 200 000 variants were identified in the sequenced samples. Previous research was used to guide a targeted analysis of the genes encoding the tenascin-C (TNC) glycoprotein and the α1 chain of type XXVII collagen (COL27A1) located on chromosome 9. Selection of variants within these genes were; however, not predetermined but based on a tiered filtering strategy. Four variants in TNC (rs1061494, rs1138545, rs2104772 and rs1061495) and three variants in the upstream COL27A1 gene (rs2567706, rs2241671 and rs2567705) were genotyped in larger Achilles tendinopathy and anterior cruciate ligament (ACL) rupture sample groups. The CC genotype of TNC rs1061494 (C/T) was associated with the risk of Achilles tendinopathy (p = 0.018, OR: 2.5 95% CI: 1.2–5.1). Furthermore, the AA genotype of the TNC rs2104772 (A/T) variant was significantly associated with ACL ruptures in the female subgroup (p = 0.035, OR: 2.3 95% CI: 1.1–5.5). An inferred haplotype in the TNC gene was also associated with the risk of Achilles tendinopathy. These results provide a proof of concept for the use of a customised pipeline for the exploration of a larger genomic dataset. This approach, using previous research to guide a targeted analysis of the data has generated new genetic signatures in the biology of musculoskeletal soft tissue injuries.Item Development of a simple artificial intelligence method to accurately subtype breast cancers based on gene expression barcodes(University of the Western Cape, 2018) Esterhuysen, Fanechka Naomi; Gamieldien, JunaidINTRODUCTION: Breast cancer is a highly heterogeneous disease. The complexity of achieving an accurate diagnosis and an effective treatment regimen lies within this heterogeneity. Subtypes of the disease are not simply molecular, i.e. hormone receptor over-expression or absence, but the tumour itself is heterogeneous in terms of tissue of origin, metastases, and histopathological variability. Accurate tumour classification vastly improves treatment decisions, patient outcomes and 5-year survival rates. Gene expression studies aided by transcriptomic technologies such as microarrays and next-generation sequencing (e.g. RNA-Sequencing) have aided oncology researcher and clinician understanding of the complex molecular portraits of malignant breast tumours. Mechanisms governing cancers, which include tumorigenesis, gene fusions, gene over-expression and suppression, cellular process and pathway involvementinvolvement, have been elucidated through comprehensive analyses of the cancer transcriptome. Over the past 20 years, gene expression signatures, discovered with both microarray and RNA-Seq have reached clinical and commercial application through the development of tests such as Mammaprint®, OncotypeDX®, and FoundationOne® CDx, all which focus on chemotherapy sensitivity, prediction of cancer recurrence, and tumour mutational level. The Gene Expression Barcode (GExB) algorithm was developed to allow for easy interpretation and integration of microarray data through data normalization with frozen RMA (fRMA) preprocessing and conversion of relative gene expression to a sequence of 1's and 0's. Unfortunately, the algorithm has not yet been developed for RNA-Seq data. However, implementation of the GExB with feature-selection would contribute to a machine-learning based robust breast cancer and subtype classifier. METHODOLOGY: For microarray data, we applied the GExB algorithm to generate barcodes for normal breast and breast tumour samples. A two-class classifier for malignancy was developed through feature-selection on barcoded samples by selecting for genes with 85% stable absence or presence within a tissue type, and differentially stable between tissues. A multi-class feature-selection method was employed to identify genes with variable expression in one subtype, but 80% stable absence or presence in all other subtypes, i.e. 80% in n-1 subtypes. For RNA-Seq data, a barcoding method needed to be developed which could mimic the GExB algorithm for microarray data. A z-score-to-barcode method was implemented and differential gene expression analysis with selection of the top 100 genes as informative features for classification purposes. The accuracy and discriminatory capability of both microarray-based gene signatures and the RNA-Seq-based gene signatures was assessed through unsupervised and supervised machine-learning algorithms, i.e., K-means and Hierarchical clustering, as well as binary and multi-class Support Vector Machine (SVM) implementations. RESULTS: The GExB-FS method for microarray data yielded an 85-probe and 346-probe informative set for two-class and multi-class classifiers, respectively. The two-class classifier predicted samples as either normal or malignant with 100% accuracy and the multi-class classifier predicted molecular subtype with 96.5% accuracy with SVM. Combining RNA-Seq DE analysis for feature-selection with the z-score-to-barcode method, resulted in a two-class classifier for malignancy, and a multi-class classifier for normal-from-healthy, normal-adjacent-tumour (from cancer patients), and breast tumour samples with 100% accuracy. Most notably, a normal-adjacent-tumour gene expression signature emerged, which differentiated it from normal breast tissues in healthy individuals. CONCLUSION: A potentially novel method for microarray and RNA-Seq data transformation, feature selection and classifier development was established. The universal application of the microarray signatures and validity of the z-score-to-barcode method was proven with 95% accurate classification of RNA-Seq barcoded samples with a microarray discovered gene expression signature. The results from this comprehensive study into the discovery of robust gene expression signatures holds immense potential for further R&F towards implementation at the clinical endpoint, and translation to simpler and cost-effective laboratory methods such as qtPCR-based tests.Item Exome sequencing identifies novel dysferlin mutation in a family with pauci-symptomatic heterozygous carriers(Springer Nature, 2018) Jalali-Sefid-Dashti, Mahjoubeh; Nel, Melissa; Heckmann, Jeannine M.; Gamieldien, JunaidBackground: We investigated a South African family of admixed ancestry in which the first generation (G1) developed insidious progressive distal to proximal weakness in their twenties, while their offspring (G2) experienced severe unexpected symptoms of myalgia and cramps since adolescence. Our aim was to identify deleterious mutations that segregate with the affected individuals in this family. Methods: Exome sequencing was performed on five cases, which included three affected G1 siblings and two pauci-symptomatic G2 offspring. As controls we included an unaffected G1 sibling and a spouse of one of the G1 affected individuals. Homozygous or potentially compound heterozygous variants that were predicted to be functional and segregated with the affected G1 siblings, were further evaluated. Additionally, we considered variants in all genes segregating exclusively with the affected (G1) and pauci-symptomatic (G2) individuals to address the possibility of a pseudo-autosomal dominant inheritance pattern in this family. Results: All affected G1 individuals were homozygous for a novel truncating p.Tyr1433Ter DYSF (dysferlin) mutation, with their asymptomatic sibling and both pauci-symptomatic G2 offspring carrying only a single mutant allele. Sanger sequencing confirmed segregation of the variant. No additional potentially contributing variant was found in the DYSF or any other relevant gene in the pauci-symptomatic carriers. Conclusion: Our finding of a truncating dysferlin mutation confirmed dysferlinopathy in this family and we propose that the single mutant allele is the primary contributor to the neuromuscular symptoms seen in the second-generation pauci-symptomatic carriers.Item Exome sequencing identifies novel dysferlin mutation in a family with paucisymptomatic heterozygous carriers(BioMed Central, 2018) Jalali-Sefid-Dashti, Mahjoubeh; Nel, Melissa; Heckmann, Jeannine M.; Gamieldien, JunaidBACKGROUND: We investigated a South African family of admixed ancestry in which the first generation (G1) developed insidious progressive distal to proximal weakness in their twenties, while their offspring (G2) experienced severe unexpected symptoms of myalgia and cramps since adolescence. Our aim was to identify deleterious mutations that segregate with the affected individuals in this family. METHODS: Exome sequencing was performed on five cases, which included three affected G1 siblings and two pauci-symptomatic G2 offspring. As controls we included an unaffected G1 sibling and a spouse of one of the G1 affected individuals. Homozygous or potentially compound heterozygous variants that were predicted to be functional and segregated with the affected G1 siblings, were further evaluated. Additionally, we considered variants in all genes segregating exclusively with the affected (G1) and pauci-symptomatic (G2) individuals to address the possibility of a pseudo-autosomal dominant inheritance pattern in this family. RESULTS: All affected G1 individuals were homozygous for a novel truncating p.Tyr1433Ter DYSF (dysferlin) mutation, with their asymptomatic sibling and both pauci-symptomatic G2 offspring carrying only a single mutant allele. Sanger sequencing confirmed segregation of the variant. No additional potentially contributing variant was found in the DYSF or any other relevant gene in the pauci-symptomatic carriers. CONCLUSION: Our finding of a truncating dysferlin mutation confirmed dysferlinopathy in this family and we propose that the single mutant allele is the primary contributor to the neuromuscular symptoms seen in the second-generation pauci-symptomatic carriers.Item Genome assembly of next-generation sequencing data for the Oryx bacillus : species of the Mycobacterium tuberculosis complex(University of the Western Cape, 2011) Direko, Mmakamohelo; Christoffels, Alan; Gamieldien, JunaidNext generation sequencing (NGS) technology platforms have accelerated ability to produce completed genome assemblies. Recently, collaborators at Tygerberg Medical School outsourced the sequencing of Oryx bacillus, a member of the Mycobacterium tuberculosis complex (MTC). A total of 31,271,059 short reads were generated and required filtering, assembly and annotation using bioinformatics algorithms. In this project, an NGS assembly pipeline was implemented, tailored specifically for SOLiD sequence data. The raw reads were aligned to seven fully sequenced and annotated MTC members, namely, Mycobacterium tuberculosis H37Rv, H37Ra, CDC1551, F11, KZN 1435, Mycobacterium bovis AF2122/97 and Mycobacterium bovis BCG str. Pasteur 1173P2 using NovoalignCS. Depth and breadth of sequence coverage across each base of the reference genome was calculated using BEDTools, and structural variation. Structural variation at the nucleotide level including deletions, insertions and single nucleotidepolymorphisms (SNPs) were called using three tools, GATK, SAMtools and Nesoni. These variations were further filtered using in-house PERL scripts. Putative functional roles for the alterations at the DNA level were extrapolated from the overlap with essential genes present in annotated MTC members. Approximately 20,730,631 short reads (59.78%) out of a total of 31,271,059 reads aligned to the seven reference genomes. The per base sequence coverage calculations revealed an average of 1,243 unaligned regions. These unaligned regions overlapped with mycobacterial regions of difference (RD) and genetic phage elements acquired by the MTC through horizontal gene transfer and are genes prevalent in the clinical isolates of M. tuberculosis. A total of 2,680 genetic variations were identified and categorised into 845 synonymous and 1,724 non-synonymous SNPs together with 44 insertions and 67 deletions. Some of the variant alleles overlapped known genes to be involved in TB drug resistance. While the biological significance of our findings remain to be elucidated, it nonetheless deserves further attention, because SNPs have the potential to impact on strain phenotype by gene disruption. Therefore, any hypotheses generated from these large-scale analyses will be tested by our collaborators at Tygerberg medical school.Item Human coronavirus OC43 3CL protease and the potential of ML188 as a broad-spectrum lead compound: Homology modelling and molecular dynamic studies(Springer Nature, 2015) Berry, Michael; Fielding, Burtram; Gamieldien, JunaidThe coronavirus 3 chymotrypsin-like protease (3CLpro) is a validated target in the design of potential anticoronavirus inhibitors. The high degree of homology within the protease’s active site and substrate conservation supports the identification of broad spectrum lead compounds. A previous study identified the compound ML188, also termed 16R, as an inhibitor of the Severe Acute Respiratory Syndrome coronavirus (SARS-CoV) 3CLpro. This study will detail the generation of a homology model of the 3CLpro of the human coronavirus OC43 and determine the potential of 16R to form a broad-spectrum lead compound. MODELLER was used to generate a suitable three-dimensional model of the OC43 3CLpro and the Prime module of Schrӧdinger predicted the binding conformation and free energy of binding of 16R within the 3CLpro active site. Molecular dynamics further confirmed ligand stability and hydrogen bonding networks.Item Identification of new respiratory viruses in the new millennium(MDPI, 2015) Berry, Michael; Gamieldien, JunaidThe rapid advancement of molecular tools in the past 15 years has allowed for the retrospective discovery of several new respiratory viruses as well as the characterization of novel emergent strains. The inability to characterize the etiological origins of respiratory conditions, particularly in children, led several researchers to pursue the discovery of the underlying etiology of disease. In 2001, this led to the discovery of human metapneumovirus (hMPV) and soon following that the outbreak of Severe Acute Respiratory Syndrome coronavirus (SARS-CoV) promoted an increased interest in coronavirology and the latter discovery of human coronavirus (HCoV) NL63 and HCoV-HKU1. Human bocavirus, with its four separate lineages, discovered in 2005, has been linked to acute respiratory tract infections and gastrointestinal complications. Middle East Respiratory Syndrome coronavirus (MERS-CoV) represents the most recent outbreak of a completely novel respiratory virus, which occurred in Saudi Arabia in 2012 and presents a significant threat to human health. This review will detail the most current clinical and epidemiological findings to all respiratory viruses discovered since 2001.Item Identification of novel prognostic markers of survival time in high-risk neuroblastoma using gene expression profiles(Impact Journals, 2020) Giwa, Abdulazeez; Fatai, Azeez A.; Gamieldien, JunaidNeuroblastoma is the most common extracranial solid tumor in childhood. Patients in high-risk group often have poor outcomes with low survival rates despite several treatment options. This study aimed to identify a genetic signature from gene expression profiles that can serve as prognostic indicators of survival time in patients of high-risk neuroblastoma, and that could be potential therapeutic targets. RNA-seq count data was downloaded from UCSC Xena browser and samples grouped into Short Survival (SS) and Long Survival (LS) groups. Differential gene expression (DGE) analysis, enrichment analyses, regulatory network analysis and machine learning (ML) prediction of survival group were performed. Forty differentially expressed genes (DEGs) were identified including genes involved in molecular function activities essential for tumor proliferation. DEGs used as features for prediction of survival groups included EVX2, NHLH2, PRSS12, POU6F2, HOXD10, MAPK15, RTL1, LGR5, CYP17A1, OR10AB1P, MYH14, LRRTM3, GRIN3A, HS3ST5, CRYAB and NXPH3. An accuracy score of 82% was obtained by the ML classification models. SMIM28 was revealed to possibly have a role in tumor proliferation and aggressiveness. Our results indicate that these DEGs can serve as prognostic indicators of survival in high-risk neuroblastoma patients and will assist clinicians in making better therapeutic and patient management decisions. © 2020 Giwa et al.Item Identification of potential biomarkers in lung cancer as possible diagnostic agents using bioinformatics and molecular approaches(University of the Western Cape, 2015) Ahmed, Firdous; Pretorius, Ashley; Gamieldien, Kareemah; Gamieldien, JunaidLung cancer remains the leading cause of cancer deaths worldwide, with the majority of cases attributed to non-small cell lung carcinomas. At the time of diagnosis, a large percentage of patients present with advanced stage of disease, ultimately resulting in a poor prognosis. The identification circulatory markers, overexpressed by the tumour tissue, could facilitate the discovery of an early, specific, non-invasive diagnostic tool as well as improving prognosis and treatment protocols. The aim was to analyse gene expression data from both microarray and RNA sequencing platforms, using bioinformatics and statistical analysis tools. Enrichment analysis sought to identify genes, which were differentially expressed (p < 0.05, FC > 2) and had the potential to be secreted into the extracellular circulation, by using Gene Ontology terms of the Cellular Component. Results identified 1 657 statically significant genes between normal and early lung cancer tissue, with only 1 gene differentially expressed (DE) between the early and late stage disease. Following statistical analysis, 171 DE genes selected as potential early stage biomarkers. The overall sensitivity of RNAseq, in comparison to arrays enabled the identification of 57 potential serum markers. These genes of interest were all downregulated in the tumour tissue, and while they did not facilitate the discovery of an ideal diagnostic marker based on the set criteria in this study, their roles in disease initiation and progression require further analysis.Item Integrating regulatory and methylome data for the discovery of clear cell Renal Cell Carcinoma (ccRCC) variants(University of the Western Cape, 2015) Calvert-Joshua, Tracey; Tiffin, Nicki; Gamieldien, JunaidKidney cancers, of which clear cell renal cell carcinoma comprises an estimated 70%, have been placed amongst the top ten most common cancers in both males and females. With a mortality rate that exceeds 40%, kidney cancer is considered the most lethal cancer of the genitourinary system. Despite advances in its treatment, the mortality- and incidence rates across all stages of the disease have continued to climb. Since the release of the Human Genome Project in the early 2000’s, most genetics studies have focused on the protein coding region of the human genome, which accounts for a mere 2% of the entire genome. It has been suggested that diverting our focus to the other 98% of the genome, which was previously dismissed as non-functional “junk DNA”, could possibly contribute significantly to our understanding of the underlying mechanisms of complex diseases.In this study a whole genome sequencing somatic mutation data set from the International Cancer Genome Consortium was used. The non-coding somatic mutations within the promoter, intronic, 5-prime untranslated and 3-prime untranslated regions of clear cell renal cell carcinoma-implicated genes were extracted and submitted to RegulomDB for their functional annotation.As expected, most of the variants were located within the intronic regions and only a small subset of identified variants was predicted to be deleterious. Although the variants all belonged to a selected subset of kidney cancer-associated genes, the genes frequently mutated in the non-coding regions were not the same genes that were frequently mutated in the whole exome studies (where the focus is on the coding sequences). This indicates that with whole genome sequencing studies a new set of genes/variants previously unassociated with the clear cell renal cell carcinoma could be identified. In addition, most of the non-coding somatic variants fell within multiple transcriptions factor binding sites. Since many of these variants were also deleterious (as predicted by RegulomDB), this suggests that mutations in the non-coding regions could contribute to disease due to their role in transcription factor binding site disruptions and their subsequent impact on transcriptional regulation. The substantial overlap between the genes with the most aberrantly methylated variants and the genes with the most transcription factor binding site disruptions signifies a potential link between differential methylation and transcription factor binding site affinities. In contrast to the upregulated DNA methylation generally seen in promoter methylation studies, all of the significant hits in this study were hypomethylated, with the subsequent up-regulation of the genes of interest, suggesting that in the clear cell renal cell carcinoma, aberrant methylation may play a role in activating proto-oncogenes, rather than the silencing of genes. When a cross-analysis was carried out between the gene expression patterns and the transcription factor binding site disruptions, the non-coding somatic variants and differential methylation profiles, the genes affected again showed a clear overlap. Interestingly, most of the variants were not present in the 1000genomes data and thus represent novel mutations, which possibly occurred as a result of genomic instability. However, identifying novel variants are always promising, since they epitomise the possibility of developing pioneering ways to target diseases. The numerous detrimental effects a single non-coding mutation can have on other genomic processes have been demonstrated in this study and therefore validate the inclusion of non-coding regions of the genome in genetic studies in order to study complex multifactorial diseases.Item “An investigation into the MicroRNA-gene interactions involved in the pathogenesis of systemic lupus erythematosus”(University of the Western Cape, 2015) Pitts, Stephanie Julia; Tiffin, Nicki; Gamieldien, JunaidSystemic lupus erythematosus is a chronic, inflammatory disease characterised by the production of autoantibodies which target particularly the nuclear components of multiple cell types throughout the body. MicroRNA’s have been well-established to regulate gene function by partial-, or complete binding to the 3’-UTR of the target genes, causing repression or complete degradation of the target gene. As a result, proteins normally produced by the targeted mRNA would exhibit a decrease in production.The aim of this study was to investigate the interactions between genes and microRNAs implicated in the pathogenesis of SLE. Objectives included curating lists of miRNAs and genes associated with lupus pathogenesis, to identify regulatory targets of miRNAs and genes targeted by miRNAs, and to find the intersections of these outputs. By examining the intersections of the resultant targets, we aimed to identify novel interactions using Pathway Analysis, which have not been previously reported in scientific literature, to be associated with the pathogenesis of SLE. Understanding the miRNA-gene target interactions in the progression of SLE may provide us with essential biomarkers and targets for disease diagnosis and therapy.Item Massively-Parallel Computational Identification of Novel Broad Spectrum Antivirals to Combat Coronavirus Infection(University of the Western Cape, 2015) Berry, Michael; Gamieldien, JunaidGiven the significant disease burden caused by human coronaviruses, the discovery of an effective antiviral strategy is paramount, however there is still no effective therapy to combat infection. This thesis details the in silica exploration of ligand libraries to identify candidate lead compounds that, based on multiple criteria, have a high probability of inhibiting the 3 chymotrypsin-like protease (3CUro) of human coronaviruses. Atomistic models of the 3CUro were obtained from the Protein Data Bank or theoretical models were successfully generated by homology modelling. These structures served the basis of both structure- and ligand-based drug design studies. Consensus molecular docking and pharmacophore modelling protocols were adapted to explore the ZINC Drugs-Now dataset in a high throughput virtual screening strategy to identify ligands which computationally bound to the active site of the 3CUro . Molecular dynamics was further utilized to confirm the binding mode and interactions observed in the static structure- and ligand-based techniques were correct via analysis of various parameters in a IOns simulation. Molecular docking and pharmacophore models identified a total of 19 ligands which displayed the potential to computationally bind to all 3CUro included in the study. Strategies employed to identify these lead compounds also indicated that a known inhibitor of the SARS-Co V 3CUro also has potential as a broad spectrum lead compound. Further analysis by molecular dynamic simulations largely confirmed the binding mode and ligand orientations identified by the former techniques. The comprehensive approach used in this study improves the probability of identifying experimental actives and represents a cost effective pipeline for the often expensive and time consuming process of lead discovery. These identified lead compounds represent an ideal starting point for assays to confirm in vitro activity, where experimentally confirmed actives will be proceeded to subsequent studies on lead optimization.Item Modelling human protein interaction networks as metric spaces has potential in disease research and drug target discovery(BMC, 2014) Fadhal, Emad; Mwambene, Eric C; Gamieldien, JunaidWe have recently shown by formally modelling human protein interaction networks (PINs) as metric spaces and classified proteins into zones based on their distance from the topological centre that hub proteins are primarily centrally located. We also showed that zones closest to the network centre are enriched for critically important proteins and are also functionally very specialised for specific ‘house keeping’ functions. We proposed that proteins closest to the network centre may present good therapeutic targets. Here, we present multiple pieces of novel functional evidence that provides strong support for this hypothesis. We found that the human PINs has a highly connected signalling core, with the majority of proteins involved in signalling located in the two zones closest to the topological centre. The majority of essential, disease related, tumour suppressor, oncogenic and approved drug target proteins were found to be centrally located. Similarly, the majority of proteins consistently expressed in 13 types of cancer are also predominantly located in zones closest to the centre. Proteins from zones 1 and 2 were also found to comprise the majority of proteins in key KEGG pathways such as MAPK-signalling, the cell cycle, apoptosis and also pathways in cancer, with very similar patterns seen in pathways that lead to cancers such as melanoma and glioma, and non-neoplastic diseases such as measles, inflammatory bowel disease and Alzheimer’s disease.