A new paper authored by NHS England, IQVIA, Amgen and Health Data Insight (HDI) highlights how HDI’s synthetic dataset, Simulacrum, is transforming cancer research in England. The paper “Leveraging synthetic data to facilitate research: A collaborative model for analyzing sensitive national cancer registry data in England,” outlines a novel way of conducting research with administrative healthcare data, in a privacy-compliant way, facilitated by a collaborative model of cross-organisation working. 

Simulacrum is a synthetic, publicly available version of the National Disease Registration Service (NDRS) Cancer Analysis System (CAS) that contains no real patient information but maintains the look and feel of the original data. This allows researchers to carry out preliminary data analysis, generate hypotheses and develop programming code, without accessing sensitive patient data.  

Once developed, this code can be run by HDI-NDRS partnership analysts on the real data to generate anonymised, aggregated outputs that directly support predefined research questions that benefit patients.  

This innovative model reduces barriers to data access and accelerates research timelines. The study analysed 18 projects completed between 2021 and 2024, revealing impressive results: 

  • 2.3 months average time from start of code development to final results 
  • 67% of projects only needed one session running code on the real data to get the desired outputs 
  • 14 days average turnaround from data release request to delivery 

The collaborative model has supported crucial research across multiple cancer types including lung, prostate, blood, bone, skin and gastric cancers. It demonstrates the value of synthetic data in facilitating efficient research while maintaining privacy compliance. The approach offers a scalable model that could be adopted by healthcare systems worldwide. 

Share This