What is the opportunity?
We are offering up to five internships working on data collected by the largest cancer registry in the world.
You will have the opportunity to undertake an exploratory project guided and supported by developers and analysts. There are seven projects outlined below. However if you have a specific project of your own in mind that you would like to undertake, please let us know.
Who can apply?
We are looking for innovative and enthusiastic students who are keen to learn more about coding, using big data to solve problems and developing tools to support its use. We are inviting applications from students at all stages of their education, from undergraduate to PhD.
We can only accept applications from students studying in the UK who are eligible to work full-time in the UK.
What skills/experience do I need?
You will need to be innovative, creative, numerate and have some working knowledge of data analysis, such as via Excel or SQL. Candidates from all areas of study are welcome, particularly if they have interest or experience in one or more of these areas: data visualisation and communication, working with large datasets, data and statistical modelling, machine learning.
“Throughout the internship, I was given continuous support and had individual software development classes from my mentor, which I greatly enjoyed.” (2017 intern)
What do we offer?
Students on the placement will receive support and mentoring from a senior member of the National Cancer Registration and Analysis team. You will be working on anonymous cancer data.
Each internship will receive £1,000 per month as a salary and to cover expenses. We will cover reasonable travel expenses should this be a requirement of the placement.
When will I need to be available?
Placements are offered for two to three months between June and September. There will be a one-day induction in early June (date TBC) and a final day presentation and prize giving event at the end of September (date TBC).
Where will the placement be?
Staff are based across the UK – see project details below. Your location will be dependent on the staff member you are working with. The induction events will be in Cambridge or London.
How to apply for a summer placement
The summer placement is open to any student at a UK university who is eligible to work full-time in the UK.
To apply, please send email@example.com your current CV and a covering letter outlining any achievements that you feel are relevant and why you want the placement. Please also include the title of the project that you are interested in working on from the list below OR outline your own project that you wish to pursue.
Closing date is midnight on Friday 22nd February 2019. An email confirming receipt of your application will be sent within three working days.
For each placement, a number of applicants may be shortlisted and you may be required to attend an interview or complete a longer application.
“Throughout my six weeks at HDI everyone I had the pleasure of meeting during my internship was extremely friendly and helpful. From the moment I started working there I felt extremely welcome and it was clear that this would be a supportive environment” (2017 intern)
Q: I am a non-EU student studying at a University outside of the UK. Am I eligible for your Placement Scheme?
A: No, unfortunately we are not able to sponsor students from non-EEA countries.
Q: I am a non-EU student studying at a University within the UK. Am I eligible for your Placement Scheme?
A: No unfortunately we are not able to offer the placement to non-EU students
Q: Are students who are completing a Masters Degree eligible to apply for a summer placement?
A: Yes we are not placing any restrictions on the level of education/range of qualifications an individual needs to be eligible
Q: I did not get an email to confirm you have received my application. What should I do?
A: All candidates should receive an email within 3 working days of submitting an application. Please check your junk email and if you have not received confirmation please contact firstname.lastname@example.org.
“During my internship at HDI I spent a lot of time playing with data in Oracle SQL. The training and guidance I received on this was superb.” (2017 intern)
This year’s project outlines:
Data visualisation for improved engagement with public cancer data
|Main Aim:||The Get Data Out programme at Public Health England is seeking a data visualisation intern to derive insights from and enable public engagement with openly-available data on cancer.|
|Brief Summary of the work involved:||The Get Data Out programme routinely publishes cancer statistics produced by PHE in a consistent Standard Output Table – a table that collects patients into groups with common characteristics, and then publishes information such as incidence, survival, treatment rates and routes to diagnosis for these standard groups. Currently Get Data Out covers brain, ovarian, pancreatic and testicular tumours, and we hope to expand this output in the near future. All data and metadata can be found on our website: https://cancerdata.nhs.uk/standardoutput.Get Data Out has been welcomed by the cancer community, including analysts and charities focussed on rare and less common cancers, but we believe that more can be done to make these data accessible to as wide an audience as possible – and we hope to use data visualisation to improve engagement with this information.
You will have scope to influence the output of the project based on your own preliminary findings, existing skills and learning preferences. Most of our existing projects use R and RShiny.
|Skills Required:||Education in a relevant discipline, including data analysis and visualisation.
Some knowledge of mathematics and statistics.
Some programming experience, preferably using R or Python to work with data.
Creativity and an interest in cancer research.
Enthusiasm and willingness to learn.Verbal, written and data communication skills.
|Skills Desired:||Knowledge of data visualisation tools, especially with a web focus (RShiny or d3.js a particular bonus).|
Analysis and quality of synthetic radiotherapy data
|Brief Summary of the work involved:||As part of the Simulacrum project, Health Data Insight CIC is developing a synthetic version of the National Radiotherapy Dataset. The National Radiotherapy Dataset has been collected and organised by Public Health England since April 2016. The purpose of the Radiotherapy Dataset is to collect consistent and comparable data across all providers of radiotherapy services in England in order to provide intelligence for service planning, commissioning, clinical practice and research and the operational provision of radiotherapy services across England.
The synthetic radiotherapy dataset is in the early stages of development. As part of the Simulacrum team, you will run quality checks on the synthetic data. Some of these checks will involve computing metrics of comparison between distributions in the real and synthetic data. In addition you will use clinical insight on radiotherapy treatment to sense check the synthetic data and then advise on potential improvements. You will analyse and critique the standard techniques for generating synthetic data, and devise new and innovative approaches for fixing specific issues.
During this internship you will work very closely with analysts and developers operating at the forefront of synthetic data development. Your solutions for improving the synthetic data will be tested and the development team will work to implement your solutions in the generation process. There will also be opportunities to learn about the details of synthetic data generation and some of the mathematical techniques used to protect patient confidentiality.
Developing Machine Learning Models for Cancer Prediction and Patient Phenotyping
|Brief Summary of the work involved:||Health Data Insight has worked with Public Health England and the NHS Business Services Authority to develop the methodology to create a database of England’s primary care prescriptions data. This has been linked to the Cancer Analysis System (CAS), a national database of all cancer diagnoses and treatment in England. The aim of the Index of Suspicion project is to use machine learning to identify patterns in medication prescribed prior to the diagnosis of cancer and other patient data to derive an “index of suspicion” that will predict when a patient is at increased likelihood of developing subsequent cancer.
As an intern working on this project, you will work with a team of analysts and developers to further develop and improve upon the current machine learning methods and algorithms to better understand prediagostic prescribing and to strengthen prediagnostic indicators of cancer into a strong predictive signal. Key difficulties of the project are the size and complexity of the prescriptions dataset, with over a billion rows and the general issues that arise from working with real patient data. As part of this internship, you will have the opportunity to learn a range of skills in machine learning in healthcare and data analysis, as well as core transferrable skills such as working as part of a team, managing and delivering projects, and developing technical solutions to meet the needs of the HDI-NCRAS team and the patients, clinicians, and other individuals and organisations who will use the findings of the work. The internship also offers valuable experience working in the competitive data science industry.
Possible specific directions for the project would be given by combinations of the following, or alternative directions suggested, to be mutually agreed:
Supporting other analytical work within the team.
Science communicator internship – communicating the value of real world data
|Brief Description:||We are looking for a creative scientific copywriter with graphic design skills to join a small team within Public Health England to bring fresh perspective about how we communicate with our stakeholders about the uses of, and benefits of, sharing data. With a creative flair for language, design and layout, we are looking for someone who isn’t afraid to challenge the status quo to produce information and visually appealing scientific communications to strengthen how the PHE Office for Data Release communicates with patients, the public and our customers. With you, we want to maximise how the reach of our communications, to make applying for and understanding how PHE data is used, more relevant, accessible and understandable.
Working with ODR programme managers , you will provide essential copy writing and graphic design support to develop and implement the ODR communication strategy and the effective development of the ODR’s visual identity by;
This is a challenging role that will allow you to expand your knowledge of health, public health and the utility of data for medical research and service improvement; whiles getting your foot in the door of the competitive health and science communications industry.
Development of an epidemiology toolkit for rare cancer data using the National Cancer Registry
|Brief Description of Project:||We are seeking a new candidate to develop a new epidemiology toolkit that will enable NCRAS analysts perform new analysis and methods with ease which will increase the efficiency of our work. The toolkit will be developed targeted at rare cancers because of the issues with small numbers; if the methodology works with small numbers, the method(s) can be extended to larger cohorts.
The candidate will perform assessment of the current survival methodologies and whether it is feasible for rare cancers which have very small numbers. This is because the standard non-parametric methods usually only work well with groups that have a large sample size.
A new adaptation (Brenner’s alternative) has recently been coded in Stata to allow for non-parametric methods to cope with producing net survival for very small groups but this still has its disadvantages. Namely, the final output must be age-standardised and that the method requires there to be at least 1 person in each defined age group.
The aim of the first part of this internship programme is to (1) assess the viability of the current non-parametric methods in the production of survival and mortality statistics in rare cancers and (2) to extend or develop new methods to accurately estimate survival and mortality for such small groups.
The candidate will perform simple regression models to assess the pattern of incidence over time and develop either a new model or adapt APC models to accurate project incidence into the future. This will be developed for the rare cancer setting first, as the method will work for larger cohorts if it is robust enough for small cohorts.
The aim if the second part of this internship programme is to (1) assess the viability of trends of incidence over time for rare cancer sites and (2) to extend or develop new models to accurately project incidence into the future.
A combination of the following, depending on the intern’s interests, or an alternative project if proposed by the intern and mutually agreed:
• Produce a ‘toolkit’ program in Stata (similar to MATA) that will be circulated to the NCRAS analysts to use.
|Skills Required:||The project involves analysing datasets containing anonymised personal information, so information governance training will be provided. Creativity and an interest in cancer research are expected. Some knowledge of mathematics, statistics and probability are required.|
|Skills Desired:||An interest in statistical theory and in particular, experience with survival or mortality analyses. Experience using statistical software (such as Stata) would be highly beneficial for this internship.|
Automation of Data Production Programming Internship
|Brief Summary of the work involved:||The Office for Data Release (ODR), as part of Public Health England, is responsible for providing a common governance framework for responding to requests to access data held by PHE for secondary purposes, including service improvement, surveillance and ethically approved research. The ODR is responsible for ensuring data governance and protection principles are applied to each release.
More information on the role of the ODR can be found here
A large proportion of the data releases overseen by the ODR are to access cancer data. ODR staff work closely with analysts from PHE’s National Cancer Registration and Analysis Service (NCRAS) to respond to these requests. The data for servicing these requests is held in a large collection of linked datasets in Oracle databases. ODR and NCRAS have identified that the tasks undertaken in processing requests are similar across all requests. These include; cohort definition, data extraction, data linkage, identifiability checks, pseudonymisation, aggregation, quality assurance and metadata production. However, a lot of this work is duplicated for each new request and is subject to differing interpretation.
ODR have initiated a programme of work to undertake the development of automation tools to support and standardize this work, with various deliverables identified, including;
The initial focus of this will be developed in the context of cancer data, it would be a further aim that the model developed through this project would provide an exemplar to support release of data from other data assets held by PHE.
We are offering an intern placement to help deliver this useful work. The placement would suit an individual with a good analytical background who enjoys problem solving and attention to detail. Offering an opportunity to support a programme of work with clearly defined expectations and delivering operational software solutions. The outputs from this placement will improve the efficiency of both the ODR and PHE analytical teams, and provide an excellent intern opportunity to develop skills and knowledge whilst also demonstrating competency through successful project delivery.
|Project base||Supervision will be driven by location|
Visualisation: Unlocking the Potential of Cancer Data
|Brief Summary of the work involved:||From informed patient choices and symptom awareness campaigns to communicating complex discoveries in cancer research to the clinicians who will decide an individual’s treatment options, high quality patient care relies on effective communication. A good visualisation has the power to communicate a message with far greater impact than text or raw data: It can highlight key points, provide easy summaries and comparisons, and reveal patterns or trends over time.
Health Data Insight works with Public Health England’s National Cancer Registration and Analysis Service (NCRAS) to generate new insights into healthcare data to improve outcomes in healthcare. NCRAS aims to collect data on all cases of cancer diagnosed in England for the purposes of improving cancer services and outcomes, improving patient care, and to complete in-depth research into understanding all aspects of cancer causes, symptoms, progression, and treatment effects. This data is stored in large, linked datasets, the aggregate of which contains information on all areas of the cancer pathway.
The aim of this internship will be to work with a team of analysts to come up with and develop a high-impact visualisation(s) to convey a key message(s) identified from the cancer datasets. This internship will combine strong technical skills with a high level of creativity and innovation, and will provide the opportunity to gain experience working in the competitive data science industry and to develop skills in a wide range of aspects of a collaborative working environment working with a large public sector organisation, including managing and delivering projects, working as part of a team, and developing technical solutions to meet the needs of both the HDI-NCRAS team and the patients, clinicians, and other individuals and organisations who will use the visualisation to access and understand patient data.
Principal questions to consider would be:
• What are the key messages that patients, their families, and clinicians need to know?