Home » Projects » Synthetic Data » The Simulacrum
Simulacrum logo

The Simulacrum


The Simulacrum is a dataset that contains artificial patient-like cancer data to help researchers gain insights. The Simulacrum imitates some of the data held securely by the Public Health England’s National Cancer Registration and Analysis Service. The data is synthetic and does not contain any information about real patients. It is free to use and allows anyone who wants to use record-level cancer data to do so, safe in the knowledge that while the data feels like the real thing, there is no danger of breaching patient confidentiality. The Simulacrum was developed by HDI in partnership with AstraZeneca and IQVIA and released on November 28, 2018. 

Further details:

The National Cancer Registration and Analysis Service (NCRAS) in Public Health England collects data on all patients diagnosed with cancer from across the population of England and has permission to do so without specific patient consent under Section 251 of the NHS Act 2006. The data collected by PHE is then linked with other data to create the Cancer Analysis Service (the CAS). The data is sensitive and confidential and PHE has an absolute responsibility to protect patient privacy and to ensure that the duty of confidence is maintained by any users of the data. The CAS is an important and valuable data source used to understand patient care, outcomes and support research.

In order to increase the accessibility of the data, Health Data Insight CiC developed the Simulacrum and made it freely available. The SImulacrum contains only artificial data that models some of the properties of the data collected by NCRAS but contains no real patient data. This allows researchers, information specialists and the public to make use of the properties of the valuable information contained within its Cancer Analysis System (CAS) to improve cancer outcomes in England held by PHE, without ever compromising patient confidentiality.

The Simulacrum was developed by Health Data Insight (HDI) CiC in partnership with AstraZeneca (AZ) and IQVIA. HDI built the synthetic database and AZ and IQVIA guided the development by providing data science, oncology and technology expertise, as well as testing the final product to ensure it met researchers needs.

No patient identifiable data was released to any individuals or organisations outside of PHE.   All data transfers and any access to sensitive data remained under the control of PHE’s Office for Data Release and the PHE Caldicott Guardian.

More details about the Cancer Analysis Service, the expected benefits to patients, the protection of patient privacy and the parties involved is provided in the next sections:

The Cancer Analysis System

The National Cancer Registration and Analysis Service (NCRAS) managed by PHE has legal permission[1] to collect patient data on all patients diagnosed with cancer in England. The data have two main categories of use:

  • Primary: to inform the direct care and treatment of the patient (e.g. prescribing medicine)
  • Secondary: to inform healthcare planning, commissioning of services, delivery of care

Secondary users use de-identified data compliant with the Information Commissioner’s Code of Anonymisation meaning that researchers cannot see patients’ names, dates of birth, postcodes or other forms of protected health information.

The NCRAS data forms part of the CAS where it is linked with other healthcare databases e.g. Hospital Episode Statistics. The CAS provides a resource for a wide range of public health, healthcare and basic research. Access is always highly controlled to ensure that patient confidentiality is always protected and no data is released from PHE without permission of the PHE Office for Data Release and the PHE Caldicott Guardian.

If you are a cancer patient and do not wish for your data to be used in PHE’s National Cancer Registration and Analysis Service you can ask PHE to remove all of your details from the cancer registry at any time. This will not affect your treatment or care. If you wish to find out more this leaflet has more information and if you wish to opt-out of cancer registration you can email optout@phe.gov.uk where one of our staff will respond to you within 48 hours.

Benefits to patients

Accurate diagnosis, high-quality treatment and the best outcomes are important to all patients. To make sure that we achieve this, NCRAS collects data on every patient with a diagnosis of cancer. This is sensitive and personal information and therefore must be held securely and only those with permission are allowed access to identifiable or potentially identifiable data. This can make it difficult for researchers to ask even basic questions of the data.

Protection of privacy

The main purpose of the Simulacrum is to facilitate research based on the data held in the CAS whilst protecting patient confidentiality. By using synthetic data in place of the real data, researchers can work with the data without using identifiable information about any individual.

About AstraZeneca

AstraZeneca is a global, science-led biopharmaceutical company that focuses on the discovery, development and commercialisation of prescription medicines, primarily for the treatment of diseases in three therapy areas – Respiratory, Cardiovascular & Metabolic Diseases, and Oncology. The Company is also selectively active in Neuroscience and Autoimmunity. AstraZeneca operates in over 100 countries and its innovative medicines are used by millions of patients worldwide. For more information please visit: www.astrazeneca.com


IQVIA is a leading integrated information and technology-enabled healthcare service provider worldwide, dedicated to helping its clients improve their clinical, scientific and commercial results. Formed through the merger of Quintiles and IMS Health, IQVIA’s approximately 50,000 employees conduct operations in more than 100 countries. Companies seeking to improve real-world patient outcomes and enhanced clinical trial outsourcing through treatment innovations, care provision and access can leverage IQVIA’s broad range of healthcare information, technology and service solutions to drive new insights and approaches. IQVIA provides solutions that span clinical to commercial bringing clients a unique opportunity to realize the full potential of innovations and advance healthcare outcomes.

As a global leader in protecting individual patient privacy, IQVIA uses healthcare data to deliver critical, real-world disease and treatment insights. Through a wide variety of privacy-enhancing technologies and safeguards, IQVIA protects individual privacy while managing information to drive healthcare forward. These insights and execution capabilities help biotech, medical device, and pharmaceutical companies, medical researchers, government agencies, payers and other healthcare stakeholders in the development and approval of new therapies, identify unmet treatment needs and understand the safety, effectiveness and value of pharmaceutical products in improving overall health outcomes. To learn more, visit www.iqvia.com.

[1] Permission granted under Section 251 of NHS Act 2006

In collaboration with:

Astra Zeneca; IQVIA




Simulacrum 1 was released on November 28, 2018. See also our intern project ‘Testing the Simulacrum

Share This