Research Articles (SANBI)
Permanent URI for this collection
Browse
collection.page.browse.recent.head
Item Investigating antimicrobial resistance genes in Kenya, Uganda and Tanzania cattle using metagenomics(PeerJ Inc., 2024) Omar, Kauthar M.; Babajide, Abiola A.; Kitundu, George L.Antimicrobial resistance (AMR) is a growing problem in African cattle production systems, posing a threat to human and animal health and the associated economic value chain. However, there is a poor understanding of the resistomes in small-holder cattle breeds in East African countries. This study aims to examine the distribution of antimicrobial resistance genes (ARGs) in Kenya, Tanzania, and Uganda cattle using a metagenomics approach. We used the squeezemeta-abricate (assemblybased) pipeline to detect ARGs and benchmarked this approach using the centifuge-AMRplusplus (read-based) pipeline to evaluate its efficiency. Our findings reveal a significant number of ARGs of critical medical and economic importance in all three countries, including resistance to drugs of last resort such as carbapenems, suggesting the presence of highly virulent and antibiotic-resistant bacterial pathogens (ESKAPE) circulating in East Africa.Item Covid-19 among adults living with HIV: Correlates of mortality among public sector healthcare users in Western Cape, South Africa(Wiley, 2023) Kassanjee, Reshma; Davies, Mary-Ann; Tiffin, NickiIntroduction: While a large proportion of people with HIV (PWH) have experienced SARS-CoV-2 infections, there is uncertainty about the role of HIV disease severity on COVID-19 outcomes, especially in lower-income settings. We studied the association of mortality with characteristics of HIV severity and management, and vaccination, among adult PWH. Methods: We analysed observational cohort data on all PWH aged ≥15 years experiencing a diagnosed SARS-CoV-2 infection (until March 2022), who accessed public sector healthcare in the Western Cape province of South Africa. Logistic regression was used to study the association of mortality with evidence of antiretroviral therapy (ART) collection, time since first HIV evidence, CD4 cell count, viral load (among those with evidence of ART collection) and COVID-19 vaccination, adjusting for demographic characteristics, comorbidities, admission pressure, location and time period. Results: Mortality occurred in 5.7% (95% CI: 5.3,6.0) of 17,831 first-diagnosed infections. Higher mortality was associated with lower recent CD4, no evidence of ART collection, high or unknown recent viral load and recent first HIV evidence, differentially by age. Vaccination was protective. The burden of comorbidities was high, and tuberculosis (especially more recent episodes of tuberculosis), chronic kidney disease, diabetes and hypertension were associated with higher mortality, more strongly in younger adults.Item Record linkage for routinely collected health data in an African health information exchange(Swansea University, 2023) Mutemaringa, Themba; Heekes, Alexa; Tiffin, NickiThe Patient Master Index (PMI) plays an important role in management of patient information and epidemiological research, and the availability of unique patient identifiers improves the accuracy when linking patient records across disparate datasets. In our environment, however, a unique identifier is seldom present in all datasets containing patient information. Quasi identifiers are used to attempt to link patient records but sometimes present higher risk of over-linking. Data quality and completeness thus affect the ability to make correct linkages. This paper describes the record linkage system that is currently implemented at the Provincial Health Data Centre (PHDC) in the Western Cape, South Africa, and assesses its output to date.Item Application of an in silico approach identifies a genetic locus within ITGB2, and its interactions with HSPG2 and FGF9, to be associated with anterior cruciate ligament rupture risk(Taylor and Francis Group, 2023) Dlamini, Senanile B.; Saunders, Colleen J.; Laguette, Mary-Jessica N.We developed a Biomedical Knowledge Graph model that is phenotype and biological functionaware through integrating knowledge from multiple domains in a Neo4j, graph database. All known human genes were assessed through the model to identify potential new risk genes for anterior cruciate ligament (ACL) ruptures and Achilles tendinopathy (AT). Genes were prioritised and explored in a case–control study comparing participants with ACL ruptures (ACL-R), including a sub-group with non-contact mechanism injuries (ACL-NON), to uninjured control individuals (CON). After gene filtering, 3376 genes, including 411 genes identified through previous whole exome sequencing, were found to be potentially linked to AT and ACL ruptures. Four variants were prioritised: HSPG2:rs2291826A/G, HSPG2:rs2291827G/A, ITGB2:rs2230528C/T and FGF9:rs2274296C/T. The rs2230528 CC genotype was over-represented in the CON group compared to ACL-R (p < 0.001) and ACL-NON (p < 0.001) and the TT genotype and T allele were over-represented in the ACL-R group and ACL-NON compared to CON (p < 0.001) group.Item Predicting amplification of mycn using cpg methylation biomarkers in neuroblastoma(Future Science Group, 2021) Giwa, Abdulazeez; Rossouw, Sophia Catherine; Fatai, AzeezNeuroblastoma is the most common extracranial solid tumor in childhood. Amplification of MYCN in neuroblastoma is a predictor of poor prognosis. Materials and methods: DNA methylation data from the TARGET data matrix were stratified into MYCN amplified and non-amplified groups. Differential methylation analysis, clustering, recursive feature elimination (RFE), machine learning (ML), Cox regression analysis and Kaplan–Meier estimates were performed. Results and Conclusion: 663 CpGs were differentially methylated between the two groups. A total of 25 CpGs were selected by RFE for clustering and ML, and a 100% clustering accuracy was obtained. ML validation on three external datasets produced high accuracy scores of 100%, 97% and 93%. Eight survival-associated CpGs were also identified. Therapeutic interventions may need to be targeted to patient subgroups.Item Exploring new genetic variants within col5a1 intron 4‐exon 5 region and tgf‐β family with risk of anterior cruciate ligament ruptures(Wiley, 2020) Laguette, Mary‐Jessica N.; Barrow, Kelly; Saunders, Colleen J.Variants within genes encoding structural and regulatory elements of ligaments have been associated with musculoskeletal soft tissue injury risk. The role of intron 4‐exon 5 variants within the α1 chain of type V collagen (COL5A1) gene and genes of the transforming growth factor‐β (TGF‐β) family, TGFBR3 and TGFBI, was investigated on the risk of anterior cruciate ligament (ACL) ruptures. A case‐control genetic association study was performed on 210 control (CON) and 249 participants with surgically diagnosed ruptures (ACL), of which 147 reported a noncontact mechanism of injury (NON). Whole‐exome sequencing data were used to prioritize variants of potential functional relevance.Item Computational characterization of iron metabolism in the tsetse disease vector, glossina morsitans: Ire stem-loops(BMC, 2016) Dashti, Zahra Jalali Sefid; Gamieldien, Junaid; Christoffels, AlanIron metabolism and regulation is an indispensable part of species survival, most importantly for blood feeding insects. Iron regulatory proteins are central regulators of iron homeostasis, whose binding to iron response element (IRE) stem-loop structures within the UTRs of genes regulate expression at the post-transcriptional level. Despite the extensive literature on themechanism of iron regulation in human, less attention has been given to insect and more specifically the blood feeding insects, where research has mainly focused on the characterization of ferritin and transferrin. We thus, examined the mechanism of iron homeostasis through a genome-wide computational identification of IREs and other enriched motifs in the UTRs of Glossina morsitans with the view to identify new IRE-regulated genes.Item Variant-specific introduction and dispersal dynamics of SARS-CoV-2 in New York City – from Alpha to Omicron(Public Library of Science, 2023) Dellicour, Simon; Hong, Samuel L.; Harkins, Gordon W.Since the latter part of 2020, SARS-CoV-2 evolution has been characterised by the emergence of viral variants associated with distinct biological characteristics. While the main research focus has centred on the ability of new variants to increase in frequency and impact the effective reproductive number of the virus, less attention has been placed on their relative ability to establish transmission chains and to spread through a geographic area. Here, we describe a phylogeographic approach to estimate and compare the introduction and dispersal dynamics of the main SARS-CoV-2 variants – Alpha, Iota, Delta, and Omicron – that circulated in the New York City area between 2020 and 2022. Notably, our results indicate that Delta had a lower ability to establish sustained transmission chains in the NYC area and that Omicron (BA.1) was the variant fastest to disseminate across the study area. The analytical approach presented here complements non-spatially-explicit analytical approaches that seek a better understanding of the epidemiological differences that exist among successive SARS-CoV-2 variants of concernItem RAMICS: Trainable, high-speed and biologically relevant alignment of high-throughput sequencing reads to coding DNA(Oxford University Press, 2014) Wright, Imogen A.; Travers, Simon A.The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA.Item FRAGS: Estimation of coding sequence substitution rates from fragmentary data(BMC, 2004) Swart, Estienne C; Hide, Winston A; Seoighe, CathalRates of substitution in protein-coding sequences can provide important insights into evolutionary processes that are of biomedical and theoretical interest. Increased availability of coding sequence data has enabled researchers to estimate more accurately the coding sequence divergence of pairs of organisms. However the use of different data sources, alignment protocols and methods to estimate substitution rates leads to widely varying estimates of key parameters that define the coding sequence divergence of orthologous genes. Although complete genome sequence data are not available for all organisms, fragmentary sequence data can provide accurate estimates of substitution rates provided that an appropriate and consistent methodology is used and that differences in the estimates obtainable from different data sources are taken into account.Item The contribution of exon-skipping events on chromosome 22 to protein coding diversity(Cold Spring Harbor Laboratory Press, 2001) Hide, Winston A.; Babenko, Vladimir N.; van Heusden, Peter A.Completion of the human genome sequence provides evidence for a gene count with lower bound 30,000–40,000. Significant protein complexity may derive in part from multiple transcript isoforms. Recent EST based studies have revealed that alternate transcription, including alternative splicing, polyadenylation and transcription start sites, occurs within at least 30–40% of human genes. Transcript form surveys have yet to integrate the genomic context, expression, frequency, and contribution to protein diversity of isoform variation. We determine here the degree to which protein coding diversity may be influenced by alternate expression of transcripts by exhaustive manual confirmation of genome sequence annotation, and comparison to available transcript data to accurately associate skipped exon isoforms with genomic sequence. Relative expression levels of transcripts are estimated from EST database representation. The rigorous in silico method accurately identifies exon skipping using verified genome sequence. 545 genes have been studied in this first hand-curated assessment of exon skipping on chromosome 22.Item Transforming RNA-Seq gene expression to track cancer progression in the multi-stage early to advanced-stage cancer development(Public Library of Science, 2023) Livesey, Michelle; Rossouw, Sophia Catherine; Blignaut, RenetteCancer progression can be tracked by gene expression changes that occur throughout early-stage to advanced-stage cancer development. The accumulated genetic changes can be detected when gene expression levels in advanced-stage are less variable but show high variability in early-stage. Normalizing advanced-stage expression samples with earlystage and clustering of the normalized expression samples can reveal cancers with similar or different progression and provide insight into clinical and phenotypic patterns of patient samples within the same cancer.Item Changes in subcutaneous adipose tissue microRNA expression in response to exercise training in African women with obesity(Nature Research, 2022) Pheiffer, Carmen; Dias, Stephanie; Pretorius, AshleyThe mechanisms that underlie exercise-induced adaptations in adipose tissue have not been elucidated, yet, accumulating studies suggest an important role for microRNAs (miRNAs). This study aimed to investigate miRNA expression in gluteal subcutaneous adipose tissue (GSAT) in response to a 12-week exercise intervention in South African women with obesity, and to assess depot-specific differences in miRNA expression in GSAT and abdominal subcutaneous adipose tissue (ASAT). In addition, the association between exercise-induced changes in miRNA expression and metabolic risk was evaluated. Women underwent 12-weeks of supervised aerobic and resistance training (n = 19) or maintained their regular physical activity during this period (n = 12). Exercise-induced miRNAs were identified in GSAT using Illumina sequencing, followed by analysis of differentially expressed miRNAs in GSAT and ASAT using quantitative real-time PCR. Associations between the changes (pre- and postexercise training) in miRNA expression and metabolic parameters were evaluated using Spearman’s correlation tests.Item Role of Indigenous and local knowledge in seasonal forecasts and climate adaptation: A case study of smallholder farmers in Chiredzi, Zimbabwe(Elsevier, 2023) Zvobgo, Luckson; Johnston, Peter; Olagbegi, Oladapo M.Accessible, reliable and diverse sources of climate information are needed to inform climate change adaptation at all levels of society, particularly for vulnerable sectors such as smallholder farming. Globally, many smallholder farmers use Indigenous knowledge (IK) and local knowledge (LK) to forecast weather and climate; however, less is known about how the use of these forecasts connects to decisions and actions for reducing climate risks. We examined the role of IK and LK in seasonal forecasting and the broader climate adaptation decision-making of smallholder farmers in Chiredzi, Zimbabwe. The data were collected from a sample of 100 smallholder farmers. Seventy-three of the 100 interviewed farmers used IK and LK weather and climate forecasts, and 32% relied solely on IK and LK forecasts for climate adaptation decision-making. Observations of cuckoo birds, leafsprouting of Mopane trees, high summer temperatures, and Nimbus clouds are the main indicators used for IK and LK forecasts. The use of IK and LK climate forecasts was significantly positively associated with increasing farmer age and farmland size.Item Using multiplex amplicon pcr technology to efficiently and timely generate rift valley fever virus sequence data for genomic surveillance(MDPI, 2023) Juma, John; Konongoi, Samson L.; Nsengimana, IsidoreRift Valley fever (RVF) is a febrile vector-borne disease endemic in Africa and continues to spread in new territories. It is a climate-sensitive disease mostly triggered by abnormal rainfall patterns. The disease is associated with high mortality and morbidity in both humans and livestock. RVF is caused by the Rift Valley fever virus (RVFV) of the genus Phlebovirus in the family Phenuiviridae. It is a tripartite RNA virus with three genomic segments: small (S), medium (M) and large (L). Pathogen genomic sequencing is becoming a routine procedure and a powerful tool for understanding the evolutionary dynamics of infectious organisms, including viruses. Inspired by the utility of amplicon-based sequencing demonstrated in severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) and Ebola, Zika and West Nile viruses, we report an RVFV sample preparation based on amplicon multiplex polymerase chain reaction (amPCR) for template enrichment and reduction of background host contamination.Item Investigation of distinct gene expression profile patterns that can improve the classification of intermediate-risk prognosis in AML patients(Frontiers In Genetics, 2023) Eshibona, NAcute myeloid leukemia (AML) is a heterogeneous type of blood cancer that generally affects the elderly. AML patients are categorized with favorable-, intermediate-, and adverse risks based on an individual’s genomic features and chromosomal abnormalities. Despite the risk stratification, the progression and outcome of the disease remain highly variable. To facilitate and improve the risk stratification of AML patients, the study focused on gene expression profiling of AML patients within various risk categories.Item Application of anin silicoapproach identifies a genetic locus withinITGB2,and itsinteractions withHSPG2 and FGF9,to be associated with anterior cruciateligament rupture risk(Taylor and Francis Group, 2023) Dlamini, Senanile B.; Saunders, Colleen J.; Gamieldien, JunaidWe developed a Biomedical Knowledge Graph model that is phenotype and biological function-aware through integrating knowledge from multiple domains in a Neo4j, graph database. Allknown human genes were assessed through the model to identify potential new risk genes foranterior cruciate ligament (ACL) ruptures and Achilles tendinopathy (AT). Genes were prioritisedand explored in a case–control study comparing participants with ACL ruptures (ACL-R),including a sub-group with non-contact mechanism injuries (ACL-NON), to uninjured controlindividuals (CON). After genefiltering, 3376 genes, including 411 genes identified throughprevious whole exome sequencing, were found to be potentially linked to AT and ACL ruptures.Four variants were prioritised:HSPG2:rs2291826A/G,HSPG2:rs2291827G/A,ITGB2:rs2230528C/TandFGF9:rs2274296C/T. The rs2230528 CC genotype was over-represented in the CON groupcompared to ACL-R (p< 0.001) and ACL-NON (p< 0.001) and the TT genotype and T allele wereover-represented in the ACL-R group and ACL-NON compared to CON (p< 0.001) group. Severalsignificant differences in distributions were noted for the gene-gene interactions: (HSPG2:rs2291826, rs2291827 andITGB2:rs2230528) and (ITGB2:rs2230528 andFGF9:rs2297429).Item Proteomics analysis of the p.G849D variant in neurexin 2 alpha may reveal insight into Parkinson’s disease pathobiology(Frontiers, 2022) Cloete, RParkinson’s disease (PD), the fastest-growing neurological disorder globally, has a complex etiology. A previous study by our group identified the p.G849D variant in neurexin 2 (NRXN2), encoding the synaptic protein, NRXN2α, as a possible causal variant of PD. Therefore, we aimed to perform functional studies using proteomics in an attempt to understand the biological pathways affected by the variant. We hypothesized that this may reveal insight into the pathobiology of PD. Wild-type and mutant NRXN2α plasmids were transfected into SH-SY5Y cells. Thereafter, total protein was extracted and prepared for mass spectrometry using a Thermo Scientific Fusion mass spectrometer equipped with a Nanospray Flex ionization source. The data were then interrogated against the UniProt H. sapiens database and afterward, pathway and enrichment analyses were performed using in silico tools. Overexpression of the wild-type protein led to the enrichment of proteins involved in neurodegenerative diseases, while overexpression of the mutant protein led to the decline of proteins involved in ribosomal functioningItem SysBiolPGWAS: simplifying post-GWAS analysis through the use of computational technologies and integration of diverse omics datasets(Bioinformatics, 2022) Ajayi, OMotivation: Post-genome-wide association studies (pGWAS) analysis is designed to decipher the functional consequences of significant single-nucleotide polymorphisms (SNPs) in the era of GWAS. This can be translated into research insights and clinical benefits such as the effectiveness of strategies for disease screening, treatment and prevention. However, the setup of pGWAS (pGWAS) tools can be quite complicated, and it mostly requires big data. The challenge however is, scientists are required to have sufficient experience with several of these technically complex and complicated tools in order to complete the pGWAS analysis. Results: We present SysBiolPGWAS, a pGWAS web application that provides a comprehensive functionality for biologists and non-bioinformaticians to conduct several pGWAS analyses to overcome the above challenges. It provides unique functionalities for analysis involving multi-omics datasets and visualization using various bioinformatics tools.Item SysBiolPGWAS: Simplifying post-GWAS analysis through the use of computational technologies and integration of diverse omics datasets(Oxford University Press, 2023) Falola, Oluwadamilare; Adam, Yagoub; Ajayi, OlabodePost-genome-wide association studies (pGWAS) analysis is designed to decipher the functional consequences of significant single-nucleotide polymorphisms (SNPs) in the era of GWAS. This can be translated into research insights and clinical benefits such as the effectiveness of strategies for disease screening, treatment and prevention. However, the setup of pGWAS (pGWAS) tools can be quite complicated, and it mostly requires big data. The challenge however is, scientists are required to have sufficient experience with several of these technically complex and complicated tools in order to complete the pGWAS analysis. We present SysBiolPGWAS, a pGWAS web application that provides a comprehensive functionality for biologists and non-bioinformaticians to conduct several pGWAS analyses to overcome the above challenges. It provides unique functionalities for analysis involving multi-omics datasets and visualization using various bioinformatics tools. SysBiolPGWAS provides access to individual pGWAS tools and a novel custom pGWAS pipeline that integrates several individual pGWAS tools and data. The SysBiolPGWAS app was developed to be a one-stop shop for pGWAS analysis. It targets researchers in the area of the human genome and performs its analysis mainly in the autosomal chromosomes.