Encourages intellectual curiosity and problem solving

As a summer intern at HDI, I worked on the BRCA challenge to advance the current understanding of breast cancer variants. My role involved developing natural language processing algorithms using Python to parse free text data from hospitals into a uniformly structured dataset that could be used by geneticists and clinical bioinformaticians for ongoing data sharing and breast cancer variant interpretation. Everyone at HDI was very friendly and there is a culture that really encourages intellectual curiosity and problem solving. Throughout the internship, I was given continuous support and had individual software development classes from my mentor, which I greatly enjoyed.


A supportive environment

I applied to Health Data insight for an internship in my third year of maths, having found details on my university website. The project I initially applied for had already been filled but I was offered a different project which lasted for 6 weeks over the summer holiday. My project was to review the current RTDS (radiotherapy data set) and automate some processes associated with its handling; I started with no experience in SQL and after six weeks I felt completely comfortable working with relational databases.

Throughout my six weeks at HDI everyone I had the pleasure of meeting during my internship was extremely friendly and helpful. From the moment I started working there I felt extremely welcome and it was clear that this would be a supportive environment; indeed my supervisor made sure I had any help I needed. Overall, my internship was a wonderful experience, and I would thoroughly recommend it.


Investigating the use of machine learning

Utilising the huge amount of cancer data available at HDI, I am investigating the use of machine learning for a number of clustering and predictive tasks. When first tackling the problem I found that were electronic health records (EHR) difficult to deal with due to the vast number of categorical variables involved, after spending a week going through online courses I discovered a similarity between natural language processing (NLP) and EHR. Word2Vec is a popular technique that emerged out of NLP in 2013 that allows for meaningful vector representation of words such that we can pose the “King – Man + Woman =?” question and arrive at the result “Queen”. When applying the same technique to medical events similar meaningful representations are found. These vector representations allow for interesting visualizations of patient pathways which can then be used to look for differences in treatments and their effect on outcomes. For future work, I would like to investigate medical vectors in deep learning frameworks such as recurrent or convolutional neural networks to make predictions such as diagnosis or death.


I felt very much involved in the development of the project

I really enjoyed my experience working as an intern on the Simulacrum project at HDI.  Everyone was very friendly and a pleasure to work with.  Much of my internship was spent working closely with my line manager.  We had a number of new ideas for improving the Simulacrum data, but in the end we decided to work on developing a set of tools for testing the data.  Our tests calculate a score that measures the degree of closeness between distributions in the simulated and real datasets.  We then developed a method for generating these tests automatically.

During my internship at HDI I spent a lot of time playing with data in Oracle SQL.  The training and guidance I received on this was superb.  I was in constant close contact with my line manager and other members of the Simulacrum team.  There were a number of opportunities to attend official meetings and also to visit the office in London.  Overall I felt very much involved in the development of the project.  There were also opportunities to give presentations on my work.  These were well attended and very well organised.  I strongly recommend applying for an internship at HDI.