Data
Data Bites - Introduction to Programmatic Data De-identification with R
July 15, 2026, 10:00 am to 11:00 am
Time: 10:00am - 11:00am
Presenter: Eugene Barsky, Vanessa Choy, Grigory Artazyan
Location: Online
Workshop: Programmatic Data De-identification with R
This practical workshop, delivered by the UBC Library Research Data Management team, introduces programmatic approaches to de-identifying sensitive research data in R. Through hands-on exercises using a realistic survey dataset, participants will apply a structured workflow, from assessing privacy risks to exporting a shareable, de-identified dataset.
Participants will learn how to:
- Identify privacy risks in research data, including direct identifiers, dates, geographic variables, and free-text fields.
- Apply de-identification methods in R using dplyr, including removal, generalization, suppression, anonymization, and pseudonymization.
- Run quality assurance checks to confirm a dataset is sufficiently de-identified before sharing.
- Export a de-identified dataset and a data key file, and understand best practices for securely storing each.
---
To participate fully, you will need to install the latest versions of R and RStudio on your computer before the workshop:
- Install R from https://cran.rstudio.com/
- Install RStudio from https://rstudio.com/products/rstudio/download/#download
Note: This workshop provides a practical introduction to programmatic data de-identification. Participants are encouraged to consult their institutional privacy, legal, or compliance experts for guidance on specific datasets.
Location: ONLINE
(A Zoom link will be sent to registrants 3 hours before the event starts.)