Ten quick tips for protecting health data using de-identification and perturbation of structured datasets

dc.contributor.authorLulamba, Tshikala Eddie
dc.contributor.authorMutemaringa, Themba
dc.contributor.authorTiffin, Nicki
dc.date.accessioned2026-01-12T10:56:17Z
dc.date.available2026-01-12T10:56:17Z
dc.date.issued2025
dc.description.abstractStructured patient data generated within the health data ecosystem are shared both internally for operational use and also externally for research and public health benefit. Protecting individual privacy and health data confidentiality in these contexts relies on data de-identification and anonymisation, although there are no universally accepted standards for these processes and the techniques involved can be technically complex. We present practical recommendations grounded in the principle of data minimisation-avoiding unnecessary granularity and identifying variables that could lead to re-identification when combined with other datasets. We provide practical guidance for anonymising and perturbing structured health data in ways that support compliance with data protection laws, describing technical and operational methods for reducing re-identification risk that include rounding numerical values, replacing precise values with ranges, adding jitter to numeric fields, aggregating data, management of date values and separating sensitive fields from identifying data to prevent linkage leading to re-identification. While some methods require advanced technical knowledge, we focus here on accessible strategies that can be implemented without specialist expertise, recognising the importance of the legal and governance frameworks in which anonymisation occurs. These guidelines support researchers, data managers and institutions in sharing health data responsibly, maintaining data utility while upholding privacy and promoting ethical and legal data stewardship for data-driven health research.c
dc.identifier.citationLulamba, T.E., Mutemaringa, T. and Tiffin, N., 2025. Ten quick tips for protecting health data using de-identification and perturbation of structured datasets. PLOS Computational Biology, 21(9), p.e1013507.
dc.identifier.urihttps://doi.org/10.1371/journal.pcbi.1013507
dc.identifier.urihttps://hdl.handle.net/10566/21662
dc.language.isoen
dc.publisherPublic Library of Science
dc.subjectHealth data anonymisation
dc.subjectRe-identification risk reduction
dc.subjectData minimisation principles
dc.subjectPrivacy-preserving data techniques
dc.subjectEthical health data sharing
dc.titleTen quick tips for protecting health data using de-identification and perturbation of structured datasets
dc.typeArticle

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
lulamba_ten_quick_tips_for_protecting_2025.pdf
Size:
820.17 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: