Research Articles (SANBI)

Permanent URI for this collectionhttps://hdl.handle.net/10566/798

Browse

Now showing 1 - 20 of 132

Implementation of a genotyped African population cohort, with virtual follow-up: A feasibility study in the Western Cape Province, South Africa
(F1000 Research Ltd, 2025) Tamuhla, Tsaone; Tiffin, Nicki; Coussens, Anna
Background: There is limited knowledge regarding African genetic drivers of disease due to prohibitive costs of large-scale genomic research in Africa. Methods: We piloted a scalable virtual genotyped cohort in South Africa that was affordable in this resource-limited context, cost-effective, scalable virtual genotyped cohort in South Africa, with participant recruitment using a tiered informed consent model and DNA collection by buccal swab. Genotype data was generated using the H3Africa Illumina micro-array, and phenotype data was derived from routine health data of participants. We demonstrated feasibility of nested case control genome wide association studies using these data for phenotypes type 2 diabetes mellitus (T2DM) and severe COVID-19. Results: 2267346 variants were analysed in 459 participant samples, of which 229 (66.8%) are female. 78.6% of SNPs and 74% of samples passed quality control (QC). Principal component analysis showed extensive ancestry admixture in study participants. Of the 343 samples that passed QC, 93 participants had T2DM and 63 had severe COVID-19. For 1780 previously published COVID-19-associated variants, 3 SNPs in the pre-imputation data and 23 SNPS in the imputed data were significantly associated with severe COVID-19 cases compared to controls (p<0.05). For 2755 published T2DM associated variants, 69 SNPs in the pre-imputation data and 419 SNPs in the imputed data were significantly associated with T2DM cases when compared to controls (p<0.05). Conclusions: The results shown here are illustrative of what will be possible as the cohort expands in the future. Here we demonstrate the feasibility of this approach, recognising that the findings presented here are preliminary and require further validation once we have a sufficient sample size to improve statistical significance of findings. We implemented a genotyped population cohort with virtual follow up data in a resource-constrained African environment, demonstrating feasibility for scale up and novel health discoveries through nested case-control studies. Copyright: © 2025 Tamuhla T et al. What are we researching? Our study focuses on understanding the genetic causes of diseases in African populations, where there is very little genetic data available. Even though it's cheaper now to gather genetic data, it's still expensive to collect large amounts of data needed to study these populations properly. What were the aims of the research? Create an African Genotyped Cohort: This means gathering genetic information from African participants using tools made specifically for African genetics. Use Strong Informed Consent: We wanted to make sure participants understood and agreed to the study, allowing us to grow the cohort over time. Use Existing Health Data: To make the process affordable, we used health data that was already being collected. What did we do? We made sure our consent process was clear and thorough. We collected DNA samples from consenting participants using cheek swabs and existing blood samples. We generated and checked the quality of the genetic data. We tested if we could study diseases like Type 2 Diabetes and severe COVID-19 with this approach. What did we find? In our pilot study, we genotyped 459 samples, with 74% being good quality. We analysed over 2 million genetic markers, with 78.6% passing quality checks. Our study shows that this approach works well in African populations and can analyse their genetic diversity. The next step is to expand the cohort for more comprehensive studies
A flavonoid-based approach to mitigating hyperglycaemia: in silico analysis of AMPK activation, membrane and metabolic effects of rutin in hepg2 cells
(John Wiley and Sons Inc, 2025) Odugbemi, Adeshina Isaiah; Tata, Fave Yohanna; Ugbaja, Samuel Chima
Hyperglycaemia exacerbates tissue damage and complications in COVID-19 and metabolic disorders. Flavonoids have shown diverse therapeutic potential in metabolic and infectious diseases, yet their molecular mechanisms remain unclear. The 5' adenosine monophosphate-activated protein kinase (AMPK) is an energy sensor and a promising therapeutic target in diabetes and viral infections. This study investigated the binding affinity, structural stability and conformational dynamics of flavonoids with AMPK through molecular docking, molecular dynamics (MD) simulations (Maestro Schrodinger, Amber 18) carried out for 100 ns, and MM/GBSA binding free energy calculations. In vitro cytotoxicity and biochemical assays, including MTT, ATP, ΔΨm, CYP3A4, and LDH, were performed on HepG2 cells and analyzed using one-way ANOVA with Tukey's post hoc test. Rutin showed the strongest binding to AMPK (−64.87 ± 10.07 kcal/mol), indicating superior binding and potential biological effects. Computational analysis identified key residues Glu86, Glu92, Glu135 and Asp149 involved in direct hydrogen bonding and electrostatic interactions, contributing to binding affinity and stability in the AMPK binding pocket. Rutin maintained nearly 100% cell viability up to 500 µM, significantly increased ATP and CYP3A4 levels while reducing LDH without impacting ΔΨm, indicating preserved mitochondrial integrity, enhanced metabolism and bioenergetics. These findings showed rutin as a safe and potent AMPK modulator with potential for further optimization and preclinical evaluations to treat diabetes and other diseases associated with mitochondrial dysfunction and metabolic stress.
Bayesian estimation of HIV acquisition dates for prevention trials
(American Society for Microbiology, 2025) Labuschagne, Phillip; Rossenkhan, Raabya; Giorgi, Elena Edi
Accurate timing estimates of when participants acquire HIV in HIV prevention trials are necessary for determining antibody levels at acquisition. The Antibody-Mediated Prevention (AMP) Studies showed that a passively administered broadly neutralizing antibody can prevent the acquisition of HIV from a neutralization-sensitive virus. We developed a pipeline for estimating the date of detectable HIV acquisition (DDA) in AMP Study participants using diagnostic and viral sequence data. Using a Bayesian strategy that combines three streams of data (REN [rev/vpu/env/Δnef] sequence, GP [gag/Δpol] sequence, and diagnostic) where their 95% credible intervals overlap based on pre-specified criteria and decision rules. We evaluated the performance of our AMP pipeline using PacBio viral sequence data from 41 participants across two prospective acute HIV acquisition cohort studies, FRESH and RV217, with twice-weekly sampling. These cohort studies enrolled young women in South Africa and men and women in Kenya and Thailand, respectively, with a high likelihood of HIV acquisition. In evaluating performance, “true DDA” was the center of bounds between last-negative and first-positive RNA diagnostic tests (median time 4 days, range 2–7 days); bias was the mean difference between estimated and true DDA. Using diagnostic data alone yielded timing estimates with a bias of 2.4 days and root mean square error (RMSE) of 7.9 days. These results were improved using sequence + diagnostic data (bias 1.5 days, RMSE 6.9 days), as well as by restricting sequence-based estimation to samples from ≤5 weeks post-DDA (bias 0.2 days, RMSE 7.8 days).
Ten quick tips for protecting health data using de-identification and perturbation of structured datasets
(Public Library of Science, 2025) Lulamba, Tshikala Eddie; Mutemaringa, Themba; Tiffin, Nicki
Structured patient data generated within the health data ecosystem are shared both internally for operational use and also externally for research and public health benefit. Protecting individual privacy and health data confidentiality in these contexts relies on data de-identification and anonymisation, although there are no universally accepted standards for these processes and the techniques involved can be technically complex. We present practical recommendations grounded in the principle of data minimisation-avoiding unnecessary granularity and identifying variables that could lead to re-identification when combined with other datasets. We provide practical guidance for anonymising and perturbing structured health data in ways that support compliance with data protection laws, describing technical and operational methods for reducing re-identification risk that include rounding numerical values, replacing precise values with ranges, adding jitter to numeric fields, aggregating data, management of date values and separating sensitive fields from identifying data to prevent linkage leading to re-identification. While some methods require advanced technical knowledge, we focus here on accessible strategies that can be implemented without specialist expertise, recognising the importance of the legal and governance frameworks in which anonymisation occurs. These guidelines support researchers, data managers and institutions in sharing health data responsibly, maintaining data utility while upholding privacy and promoting ethical and legal data stewardship for data-driven health research.c
Investigating antimicrobial resistance genes in Kenya, Uganda and Tanzania cattle using metagenomics
(PeerJ Inc., 2024) Omar, Kauthar M.; Babajide, Abiola A.
Antimicrobial resistance (AMR) is a growing problem in African cattle production systems, posing a threat to human and animal health and the associated economic value chain. However, there is a poor understanding of the resistomes in small-holder cattle breeds in East African countries. This study aims to examine the distribution of antimicrobial resistance genes (ARGs) in Kenya, Tanzania, and Uganda cattle using a metagenomics approach. We used the SqueezeMeta-Abricate (assemblybased) pipeline to detect ARGs and benchmarked this approach using the Centifuge-AMRplusplus (read-based) pipeline to evaluate its efficiency. Our findings reveal a significant number of ARGs of critical medical and economic importance in all three countries, including resistance to drugs of last resort such as carbapenems, suggesting the presence of highly virulent and antibiotic-resistant bacterial pathogens (ESKAPE) circulating in east Africa. Shared ARGs such as aph(6)-id (aminoglycoside phosphotransferase), tet (tetracycline resistance gene), sul2 (sulfonamide resistance gene) and cfxA_gen (betalactamase gene) were detected. Assembly-based methods revealed fewer ARGs compared to read-based methods, indicating the sensitivity and specificity of read-based methods in resistome characterization. Our findings call for further surveillance to estimate the intensity of the antibiotic resistance problem and wider resistome classification. Effective management of livestock and antibiotic consumption is crucial in minimizing antimicrobial resistance and maximizing productivity, making these findings relevant to stakeholders, agriculturists, and veterinarians in east Africa and Africa at large.
Investigating antimicrobial resistance genes in Kenya, Uganda and Tanzania cattle using metagenomics
(PeerJ Inc., 2024) Omar, Kauthar M.; Babajide, Abiola A.; Kitundu, George L.
Antimicrobial resistance (AMR) is a growing problem in African cattle production systems, posing a threat to human and animal health and the associated economic value chain. However, there is a poor understanding of the resistomes in small-holder cattle breeds in East African countries. This study aims to examine the distribution of antimicrobial resistance genes (ARGs) in Kenya, Tanzania, and Uganda cattle using a metagenomics approach. We used the squeezemeta-abricate (assemblybased) pipeline to detect ARGs and benchmarked this approach using the centifuge-AMRplusplus (read-based) pipeline to evaluate its efficiency. Our findings reveal a significant number of ARGs of critical medical and economic importance in all three countries, including resistance to drugs of last resort such as carbapenems, suggesting the presence of highly virulent and antibiotic-resistant bacterial pathogens (ESKAPE) circulating in East Africa.
Covid-19 among adults living with HIV: Correlates of mortality among public sector healthcare users in Western Cape, South Africa
(Wiley, 2023) Kassanjee, Reshma; Davies, Mary-Ann; Tiffin, Nicki
Introduction: While a large proportion of people with HIV (PWH) have experienced SARS-CoV-2 infections, there is uncertainty about the role of HIV disease severity on COVID-19 outcomes, especially in lower-income settings. We studied the association of mortality with characteristics of HIV severity and management, and vaccination, among adult PWH. Methods: We analysed observational cohort data on all PWH aged ≥15 years experiencing a diagnosed SARS-CoV-2 infection (until March 2022), who accessed public sector healthcare in the Western Cape province of South Africa. Logistic regression was used to study the association of mortality with evidence of antiretroviral therapy (ART) collection, time since first HIV evidence, CD4 cell count, viral load (among those with evidence of ART collection) and COVID-19 vaccination, adjusting for demographic characteristics, comorbidities, admission pressure, location and time period. Results: Mortality occurred in 5.7% (95% CI: 5.3,6.0) of 17,831 first-diagnosed infections. Higher mortality was associated with lower recent CD4, no evidence of ART collection, high or unknown recent viral load and recent first HIV evidence, differentially by age. Vaccination was protective. The burden of comorbidities was high, and tuberculosis (especially more recent episodes of tuberculosis), chronic kidney disease, diabetes and hypertension were associated with higher mortality, more strongly in younger adults.
Record linkage for routinely collected health data in an African health information exchange
(Swansea University, 2023) Mutemaringa, Themba; Heekes, Alexa; Tiffin, Nicki
The Patient Master Index (PMI) plays an important role in management of patient information and epidemiological research, and the availability of unique patient identifiers improves the accuracy when linking patient records across disparate datasets. In our environment, however, a unique identifier is seldom present in all datasets containing patient information. Quasi identifiers are used to attempt to link patient records but sometimes present higher risk of over-linking. Data quality and completeness thus affect the ability to make correct linkages. This paper describes the record linkage system that is currently implemented at the Provincial Health Data Centre (PHDC) in the Western Cape, South Africa, and assesses its output to date.
Application of an in silico approach identifies a genetic locus within ITGB2, and its interactions with HSPG2 and FGF9, to be associated with anterior cruciate ligament rupture risk
(Taylor and Francis Group, 2023) Dlamini, Senanile B.; Saunders, Colleen J.; Laguette, Mary-Jessica N.
We developed a Biomedical Knowledge Graph model that is phenotype and biological functionaware through integrating knowledge from multiple domains in a Neo4j, graph database. All known human genes were assessed through the model to identify potential new risk genes for anterior cruciate ligament (ACL) ruptures and Achilles tendinopathy (AT). Genes were prioritised and explored in a case–control study comparing participants with ACL ruptures (ACL-R), including a sub-group with non-contact mechanism injuries (ACL-NON), to uninjured control individuals (CON). After gene filtering, 3376 genes, including 411 genes identified through previous whole exome sequencing, were found to be potentially linked to AT and ACL ruptures. Four variants were prioritised: HSPG2:rs2291826A/G, HSPG2:rs2291827G/A, ITGB2:rs2230528C/T and FGF9:rs2274296C/T. The rs2230528 CC genotype was over-represented in the CON group compared to ACL-R (p < 0.001) and ACL-NON (p < 0.001) and the TT genotype and T allele were over-represented in the ACL-R group and ACL-NON compared to CON (p < 0.001) group.
Predicting amplification of mycn using cpg methylation biomarkers in neuroblastoma
(Future Science Group, 2021) Giwa, Abdulazeez; Rossouw, Sophia Catherine; Fatai, Azeez
Neuroblastoma is the most common extracranial solid tumor in childhood. Amplification of MYCN in neuroblastoma is a predictor of poor prognosis. Materials and methods: DNA methylation data from the TARGET data matrix were stratified into MYCN amplified and non-amplified groups. Differential methylation analysis, clustering, recursive feature elimination (RFE), machine learning (ML), Cox regression analysis and Kaplan–Meier estimates were performed. Results and Conclusion: 663 CpGs were differentially methylated between the two groups. A total of 25 CpGs were selected by RFE for clustering and ML, and a 100% clustering accuracy was obtained. ML validation on three external datasets produced high accuracy scores of 100%, 97% and 93%. Eight survival-associated CpGs were also identified. Therapeutic interventions may need to be targeted to patient subgroups.
Exploring new genetic variants within col5a1 intron 4‐exon 5 region and tgf‐β family with risk of anterior cruciate ligament ruptures
(Wiley, 2020) Laguette, Mary‐Jessica N.; Barrow, Kelly; Saunders, Colleen J.
Variants within genes encoding structural and regulatory elements of ligaments have been associated with musculoskeletal soft tissue injury risk. The role of intron 4‐exon 5 variants within the α1 chain of type V collagen (COL5A1) gene and genes of the transforming growth factor‐β (TGF‐β) family, TGFBR3 and TGFBI, was investigated on the risk of anterior cruciate ligament (ACL) ruptures. A case‐control genetic association study was performed on 210 control (CON) and 249 participants with surgically diagnosed ruptures (ACL), of which 147 reported a noncontact mechanism of injury (NON). Whole‐exome sequencing data were used to prioritize variants of potential functional relevance.
Computational characterization of iron metabolism in the tsetse disease vector, glossina morsitans: Ire stem-loops
(BMC, 2016) Dashti, Zahra Jalali Sefid; Gamieldien, Junaid; Christoffels, Alan
Iron metabolism and regulation is an indispensable part of species survival, most importantly for blood feeding insects. Iron regulatory proteins are central regulators of iron homeostasis, whose binding to iron response element (IRE) stem-loop structures within the UTRs of genes regulate expression at the post-transcriptional level. Despite the extensive literature on themechanism of iron regulation in human, less attention has been given to insect and more specifically the blood feeding insects, where research has mainly focused on the characterization of ferritin and transferrin. We thus, examined the mechanism of iron homeostasis through a genome-wide computational identification of IREs and other enriched motifs in the UTRs of Glossina morsitans with the view to identify new IRE-regulated genes.
Variant-specific introduction and dispersal dynamics of SARS-CoV-2 in New York City – from Alpha to Omicron
(Public Library of Science, 2023) Dellicour, Simon; Hong, Samuel L.; Harkins, Gordon W.
Since the latter part of 2020, SARS-CoV-2 evolution has been characterised by the emergence of viral variants associated with distinct biological characteristics. While the main research focus has centred on the ability of new variants to increase in frequency and impact the effective reproductive number of the virus, less attention has been placed on their relative ability to establish transmission chains and to spread through a geographic area. Here, we describe a phylogeographic approach to estimate and compare the introduction and dispersal dynamics of the main SARS-CoV-2 variants – Alpha, Iota, Delta, and Omicron – that circulated in the New York City area between 2020 and 2022. Notably, our results indicate that Delta had a lower ability to establish sustained transmission chains in the NYC area and that Omicron (BA.1) was the variant fastest to disseminate across the study area. The analytical approach presented here complements non-spatially-explicit analytical approaches that seek a better understanding of the epidemiological differences that exist among successive SARS-CoV-2 variants of concern
RAMICS: Trainable, high-speed and biologically relevant alignment of high-throughput sequencing reads to coding DNA
(Oxford University Press, 2014) Wright, Imogen A.; Travers, Simon A.
The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA.
FRAGS: Estimation of coding sequence substitution rates from fragmentary data
(BMC, 2004) Swart, Estienne C; Hide, Winston A; Seoighe, Cathal
Rates of substitution in protein-coding sequences can provide important insights into evolutionary processes that are of biomedical and theoretical interest. Increased availability of coding sequence data has enabled researchers to estimate more accurately the coding sequence divergence of pairs of organisms. However the use of different data sources, alignment protocols and methods to estimate substitution rates leads to widely varying estimates of key parameters that define the coding sequence divergence of orthologous genes. Although complete genome sequence data are not available for all organisms, fragmentary sequence data can provide accurate estimates of substitution rates provided that an appropriate and consistent methodology is used and that differences in the estimates obtainable from different data sources are taken into account.
The contribution of exon-skipping events on chromosome 22 to protein coding diversity
(Cold Spring Harbor Laboratory Press, 2001) Hide, Winston A.; Babenko, Vladimir N.; van Heusden, Peter A.
Completion of the human genome sequence provides evidence for a gene count with lower bound 30,000–40,000. Significant protein complexity may derive in part from multiple transcript isoforms. Recent EST based studies have revealed that alternate transcription, including alternative splicing, polyadenylation and transcription start sites, occurs within at least 30–40% of human genes. Transcript form surveys have yet to integrate the genomic context, expression, frequency, and contribution to protein diversity of isoform variation. We determine here the degree to which protein coding diversity may be influenced by alternate expression of transcripts by exhaustive manual confirmation of genome sequence annotation, and comparison to available transcript data to accurately associate skipped exon isoforms with genomic sequence. Relative expression levels of transcripts are estimated from EST database representation. The rigorous in silico method accurately identifies exon skipping using verified genome sequence. 545 genes have been studied in this first hand-curated assessment of exon skipping on chromosome 22.
Transforming RNA-Seq gene expression to track cancer progression in the multi-stage early to advanced-stage cancer development
(Public Library of Science, 2023) Livesey, Michelle; Rossouw, Sophia Catherine; Blignaut, Renette
Cancer progression can be tracked by gene expression changes that occur throughout early-stage to advanced-stage cancer development. The accumulated genetic changes can be detected when gene expression levels in advanced-stage are less variable but show high variability in early-stage. Normalizing advanced-stage expression samples with earlystage and clustering of the normalized expression samples can reveal cancers with similar or different progression and provide insight into clinical and phenotypic patterns of patient samples within the same cancer.
Changes in subcutaneous adipose tissue microRNA expression in response to exercise training in African women with obesity
(Nature Research, 2022) Pheiffer, Carmen; Dias, Stephanie; Pretorius, Ashley
The mechanisms that underlie exercise-induced adaptations in adipose tissue have not been elucidated, yet, accumulating studies suggest an important role for microRNAs (miRNAs). This study aimed to investigate miRNA expression in gluteal subcutaneous adipose tissue (GSAT) in response to a 12-week exercise intervention in South African women with obesity, and to assess depot-specific differences in miRNA expression in GSAT and abdominal subcutaneous adipose tissue (ASAT). In addition, the association between exercise-induced changes in miRNA expression and metabolic risk was evaluated. Women underwent 12-weeks of supervised aerobic and resistance training (n = 19) or maintained their regular physical activity during this period (n = 12). Exercise-induced miRNAs were identified in GSAT using Illumina sequencing, followed by analysis of differentially expressed miRNAs in GSAT and ASAT using quantitative real-time PCR. Associations between the changes (pre- and postexercise training) in miRNA expression and metabolic parameters were evaluated using Spearman’s correlation tests.
Role of Indigenous and local knowledge in seasonal forecasts and climate adaptation: A case study of smallholder farmers in Chiredzi, Zimbabwe
(Elsevier, 2023) Zvobgo, Luckson; Johnston, Peter; Olagbegi, Oladapo M.
Accessible, reliable and diverse sources of climate information are needed to inform climate change adaptation at all levels of society, particularly for vulnerable sectors such as smallholder farming. Globally, many smallholder farmers use Indigenous knowledge (IK) and local knowledge (LK) to forecast weather and climate; however, less is known about how the use of these forecasts connects to decisions and actions for reducing climate risks. We examined the role of IK and LK in seasonal forecasting and the broader climate adaptation decision-making of smallholder farmers in Chiredzi, Zimbabwe. The data were collected from a sample of 100 smallholder farmers. Seventy-three of the 100 interviewed farmers used IK and LK weather and climate forecasts, and 32% relied solely on IK and LK forecasts for climate adaptation decision-making. Observations of cuckoo birds, leafsprouting of Mopane trees, high summer temperatures, and Nimbus clouds are the main indicators used for IK and LK forecasts. The use of IK and LK climate forecasts was significantly positively associated with increasing farmer age and farmland size.
Using multiplex amplicon pcr technology to efficiently and timely generate rift valley fever virus sequence data for genomic surveillance
(MDPI, 2023) Juma, John; Konongoi, Samson L.; Nsengimana, Isidore
Rift Valley fever (RVF) is a febrile vector-borne disease endemic in Africa and continues to spread in new territories. It is a climate-sensitive disease mostly triggered by abnormal rainfall patterns. The disease is associated with high mortality and morbidity in both humans and livestock. RVF is caused by the Rift Valley fever virus (RVFV) of the genus Phlebovirus in the family Phenuiviridae. It is a tripartite RNA virus with three genomic segments: small (S), medium (M) and large (L). Pathogen genomic sequencing is becoming a routine procedure and a powerful tool for understanding the evolutionary dynamics of infectious organisms, including viruses. Inspired by the utility of amplicon-based sequencing demonstrated in severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) and Ebola, Zika and West Nile viruses, we report an RVFV sample preparation based on amplicon multiplex polymerase chain reaction (amPCR) for template enrichment and reduction of background host contamination.

Browse

Recent Submissions