Please note applications for this scheme have now closed for 2018 intake.

What is the opportunity?

We are offering up to five internships working on data collected by the largest cancer registry in the world.

You will have the opportunity to undertake an exploratory project guided and supported by developers and analysts. There are six projects outlined below. However if you have a specific project of your own in mind that you would like to undertake, please let us know.

Who can apply?

We are looking for innovative and enthusiastic students who are keen to learn more about coding, using big data to solve problems and developing tools to support its use. We are inviting applications from students at all stages of their education, from undergraduate to PhD.

We can only accept applications from students studying in the UK who are eligible to work full-time in the UK.

What skills/experience do I need?

You will need to be innovative, creative, numerate and have some working knowledge of data analysis, such as via Excel or SQL. Candidates from all areas of study are welcome, particularly if they have interest or experience in one or more of these areas: data visualisation and communication, working with large datasets, data and statistical modelling, machine learning.

“Throughout the internship, I was given continuous support and had individual software development classes from my mentor, which I greatly enjoyed.” (2017 intern)

What do we offer?

Students on the placement will receive support and mentoring from a senior member of the National Cancer Registration and Analysis team. You will be working on anonymous cancer data.

Each internship will receive £1,000 per month as a salary and to cover expenses. We will cover reasonable travel expenses should this be a requirement of the placement.

When will I need to be available?

Placements are offered for two to three months between June and September. There will be a one-day induction in early June (date TBC) and a final day presentation and prize giving event at the end of September (date TBC).

Where will the placement be?

Staff are based across the UK – see project details below. Your location will be dependent on the staff member you are working with. The induction events will be in Cambridge or London.

How to apply for a summer placement

The summer placement is open to any student at a UK university who is eligible to work full-time in the UK.

To apply, please send your current CV and a covering letter outlining any achievements that you feel are relevant and why you want the placement.  Please also include the title of the project that you are interested in working on from the list below OR outline your own project that you wish to pursue.

Closing date is midnight on Friday 16th February 2018. An email confirming receipt of your application will be sent within three working days.

For each placement, a number of applicants may be shortlisted and you may be required to attend an interview or complete a longer application.

“Throughout my six weeks at HDI everyone I had the pleasure of meeting during my internship was extremely friendly and helpful. From the moment I started working there I felt extremely welcome and it was clear that this would be a supportive environment” (2017 intern)


Q: I am a non-EU student studying at a University outside of the UK. Am I eligible for your Placement Scheme?

A: No, unfortunately we are not able to sponsor students from non-EEA countries.

Q: I am a non-EU student studying at a University within the UK. Am I eligible for your Placement Scheme?

A: No unfortunately we are not able to offer the placement to non-EU students

Q: Are students who are completing a Masters Degree eligible to apply for a summer placement?

A: Yes we are not placing any restrictions on the level of education/range of qualifications an individual needs to be eligible

Q: I did not get an email to confirm you have received my application. What should I do?

A: All candidates should receive an email within 3 working days of submitting an application. Please check your junk email and if you have not received confirmation please contact

“During my internship at HDI I spent a lot of time playing with data in Oracle SQL.  The training and guidance I received on this was superb.” (2017 intern)

This year’s project outlines:

Index of Suspicion: Predicting Cancer from Prescriptions

Main Aim: Health Data Insight has worked with Public Health England and the NHS Business Services Authority to create a database of England’s primary care prescriptions data. This has been linked to the Cancer Analysis System (CAS), a national database of all cancer diagnoses and treatment in England. The aim of the Index of Suspicion project is to use machine learning to identify patterns in medication prescribed prior to the diagnosis of cancer to derive an “index of suspicion” that will predict when a patient is at increased likelihood of developing subsequent cancer.
Brief Summary of the work involved: The exact direction of the internship is dependent on the intern’s interests, but possible areas include:

  • The development and/or refinement of machine learning or statistical methods.
  • The development and implementation of statistical testing of the validity of conclusions.
  • The assessment, dissemination, and/or implementation of conclusions. How significant are the results? What are the best ways to communicate this, and what are the possible impacts on patients?
  • Human interpretability of models.
Skills Required: This project involves analysing datasets containing anonymised personal information, so information governance training will be provided.
Creativity and an interest in cancer research are expected. A mathematics/computer-science background, particularly experience in machine learning or statistics, would be highly useful.
Skills Desired: Experience using MATLAB, R, or NumPy/pandas would be beneficial. Experience with SQL.
Project base Cambridge


Assessment of data completeness in the National Cancer Registry and the impact on the production of Cancer Survival Statistics

Main Aim: We are seeking a new perspective in the assessment of data completeness in the Cancer Analysis System (CAS), which holds the data from the National Cancer Registry, and how this impacts on the production and development of cancer survival statistics, on which we (PHE) collaborate with the Office for National Statistics (ONS). These statistics are released on an annual basis for the Department of Health (DH) and other external stakeholders.
Brief Summary of the work involved: In this internship, there will be elements of learning/training which will be provided by the supervisors. The project will involve a combination of the following, depending on the intern’s interests, or an alternative project if proposed by the intern and mutually agreed:

  • Structured querying of national cancer database.
  • Analyse and produce a report on the patterns of missing data.
  • Develop algorithm(s) using multiple imputation methodology to deal with issues of missing data.
  • Produce Cancer Survival Statistics using the methodology developed for the Official Statistics publications in conjunction with the implementation of the newly developed algorithm(s) to deal with missing data.
  • To conduct a sensitivity analysis of the impact of the Cancer Survival Statistics from missing data.
  • Produce a peer-reviewed publication or internal report on the impact of missing data on Cancer Survival Statistics.
Skills Required: The project involves analysing datasets containing anonymised personal information, so information governance training will be provided. Creativity and an interest in cancer research are expected. Some knowledge of mathematics, statistics, and probability are required.
Skills Desired: An interest in statistical theory and, in particular, experience with multiple imputation (MI). Experience using SQL or statistical software (such as Stata) would be highly beneficial for this internship.
Project base Birmingham


What can we learn about cancer by modelling the data on it?

Main Aim: The National Cancer Registration and Analysis Service holds data on all diagnoses and treatments of cancer in England. The Simulacrum project has created models for these datasets, which have been used to generate synthetic data to allow researchers to explore individual-level cancer data without threatening the privacy of individual patients.
Brief Summary of the work involved: You would use statistics and machine learning, together with the data we hold, to look at questions like:

  • What do these models tell us about cancer directly?
  • Can these models identify data quality problems with the underlying data?
  • How can we improve the modelling methodology?
  • Can we be more data-driven while respecting privacy restrictions?
  • What interesting questions might such models suggest people ask?
Skills Required: This project involves datasets containing anonymised personal information, so information governance training will be provided. Creativity and an interest in cancer research are expected.
Skills Desired: An interest in statistical theory, probability, and machine learning. Experience using Matlab, SQL, or statistical software. An interest in working with real-world cancer data.
Project base Cambridge


Develop a tool for inferring symptoms from prescriptions histories for cancer patients

Aims: Prescriptions filled in the community setting in England are recorded in the Prescriptions Dataset.  These prescriptions are made by healthcare professionals who select pharmaceuticals with indications appropriate for each patient’s illness.  Is it possible to reverse engineer this process for cancer patients?  This project will look at leveraging Prescriptions data, the Cancer Analysis System, Healthcare Episode Statistics, and The British National Formulary with supervised and unsupervised machine learning techniques in combination with rule based systems to attempt to develop a tool for inferring the symptoms of cancer patients.
Skills Required: SQL and R
Skills Desired: Exposure to machine learning techniques and/or statistics
Project base London or Cambridge


Translating novel cancer datasets into innovative visualisations

Main Aim: The National Cancer Registration and Analysis Service (NCRAS) produces a wide range of statistical analysis and datasets used by many different audiences, including the public. Prominent productions include the Routes to Diagnosis dataset which examines the events in a patient’s pathway that lead to diagnosis, and the Get Data Out project which makes granular data more widely available while preserving individual patients’ privacy.  Such datasets drive research and transparency, as well as contributing to the improvement of patient outcomes.
Brief Summary of the work involved: This project aims to create a wide range of accessible visualisations from these large datasets. There will be three main strands as part of this:·

  • To translate the main stories for a dataset into an accurate and accessible visualisation.
  • To enable elements of interactivity in the visualisations, with the ability to adapt the display to audience needs (e.g. to toggle confidence intervals on a display).
  • To produce visualisations that are scalable and adaptable – that can be easily updated as the datasets expand to include extra years or new cancer sites for example.

These visualisations will be used on websites, published outputs, for presentations, and at conferences to promote cancer analysis and help grant increased insight into otherwise complex statistics.

Skills Required: This project involves datasets containing anonymised personal information, so information governance training will be provided. Creativity and an interest in cancer research are expected. The ability to interpret complex data and translate it into a graphical display will be essential.
Skills Desired: An interest in public health and communication of data and health information, data journalism, or statistical exploration of data.
Experience using R, JavaScript, or other programming languages, experience with visual analytic packages such as Tableau or SAS VA, and an interest in use of graphics. Use of D3 a bonus.
Project base London (Skipton House)


Mapping laboratory reports for molecular genetic testing to the National Cancer Registration and Analysis Service (NCRAS)

Main Aim: Background

Many tumours undergo molecular genetic and cytogenetic testing in NHS specialist laboratories in order to define which mutations or rearrangements underlie the malignant behaviour of the cells. These molecular tests are a key part of diagnosis, prognosis, and subtyping for many tumour types and can increasingly enable treatment to be personalised to the patient and to the specific biology of their tumour.

Brief Summary of the work involved: The Project

The National Cancer Registration and Analysis Service (NCRAS) is collecting molecular data from tumour testing directly from laboratories across England. Data from each laboratory arrives in a different format, with little consistency between laboratories. Most of the labs performing these tests have been identified; however, not all are yet supplying data to NCRAS.

All source data needs to be mapped to a specific format, contained within three genetics tables in the NCRAS database. Data is processed by a combination of computational mapping and registration by hand; however computational mapping is the preferred route where possible.

The exact details of the project will depend upon the specific skills and interests that the intern can contribute to the work. However it is likely to involve a combination of creating mapping documents to show how source data can be mapped and transformed to the unified structure, and scripting code, into which these rules are embedded, using Yet Another Mark-up Language (YAML). There may also be an element of liaising with source laboratories to clarify any ambiguities within the data, and to improve the data quality where necessary.

The project will make a tangible difference to ongoing work looking at equity of access to molecular tumour testing within the NHS, and will be of interest to CRUK, NHS providers, and commissioners.

Skills Required: We would like an enthusiastic and creative intern, with an understanding of tumour genetics and specific awareness of the different types of molecular and cytogenetic aberrations that underlie tumourigenesis. The candidate must demonstrate confidence and accuracy with interpreting and processing large datasets. The intern must have the ability to spot patterns and inconsistencies, and to pick out the scientific details of the data without losing the overall context of the clinical report.
Skills Desired: Experience in bioinformatics and data mapping would be an advantage, as would familiarity with mark-up languages – specifically scripting rules based on Regular Expressions.
Project base Birmingham


Share This