Forensic inference in Africa: Evaluating population structure, databases, and regional assignment accuracy

dc.contributor.authorKasu, Mohaimin
dc.contributor.authorMorrow, Jessica Caroline Anne
dc.contributor.authorLesaoana, Mpasi
dc.contributor.authorBrydon, Humphrey
dc.contributor.authorD’Amato, Maria Eugenia
dc.date.accessioned2026-04-02T12:21:52Z
dc.date.available2026-04-02T12:21:52Z
dc.date.issued2026
dc.description.abstractThis study reports novel 21 aSTR (autosomal Short Tandem Repeats) allele frequencies from 538 individuals, as well as 11 triallelic profiles, representing seven Bantu-speaking groups in Southern Africa (Ndebele, Pedi, Phuthi, Tsonga, Sotho, Swati, and Xhosa). These data contributed to a comprehensive representation of the Southern Bantu (SB). The defined SB reference database was evaluated for various forensic uses and applications: extant diversity, population structure, adequacy of alternative reference databases, and continental biogeographical ancestry prediction.Different analytical methods—including summary statistics, multivariate analyses (Multidimensional Scaling, MDS; Discriminant Analysis of Principal Components, DAPC), and Bayesian clustering—detected continental structure, identifying four major clusters: Southern, Eastern, Western, and Horn of Africa.This observation motivated the evaluation of two practical applications of this information: one methodological (alternative reference frequency database) and one predictive (biogeographic assignment). The adequacy of alternative reference databases for representing SB populations—STRidER South Africa, STRidER Africa, African American, and global datasets—was assessed by comparing reciprocal allelic coverage and shifts in random match probabilities (RMPs). Of the databases tested, the STRidER Africa database provided the closest representation of the SB. Population-level analyses evidenced the need for a stratification correction (θ = 0.005 or 0.01) for SB populations.Intracontinental biogeographic prediction was assessed using an XGBoost machine learning classification model across four major African regions. The model’s predictive balanced accuracy ranged from 80 % to 94 % across African regions (94 % for the Horn of Africa, 87 % for Southern Africa, 84 % for Western Africa, and 80 % for Eastern Africa).The accuracy and limitations of this practice are discussed, along with its ethical implications. The assessment of reference databases can be extended to more general applications across Africa.
dc.identifier.citationKasu, M., Morrow, J., Lesaoana, M., Brydon, H. and D’Amato, M.E., 2026. Forensic inference in Africa: evaluating population structure, databases, and regional assignment accuracy. Forensic Science International: Genetics, p.103441.
dc.identifier.uri10.1016/j.fsigen.2026.103441
dc.identifier.urihttps://hdl.handle.net/10566/22163
dc.language.isoen
dc.publisherElsevier Ireland Ltd
dc.subjectAfrican Populations
dc.subjectBiogeographic Prediction
dc.subjectMachine Learning
dc.subjectReference Databases
dc.subjectStrs (Short Tandem Repeats)
dc.titleForensic inference in Africa: Evaluating population structure, databases, and regional assignment accuracy
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
mohaimin_forensic_inference_in_africa_2026.pdf
Size:
8.78 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: