Summer Research Project 2023

During the period of June to August 2023, I participated in the School of Mathematical Sciences’ Summer Research Project. My supervisor was my lecturer and mentor, Dr Ho Weang Kee who is a researcher with Cancer Research Malaysia specialising in statistical genetics for breast cancer. Our research project was Polygenic Risk Scores for Prediction of Breast Cancer Subtypes.

The main goal of the project was to identify a method that would allow a universal Polygenic Risk Scores (PRS) to be used for people of different ethnicities and nationalities. The current PRS have varying performance on different ethnicities due to the differences in the genetic makeup which is what the PRS use to calculate the risk of developing breast cancer.

Experience

For the first two weeks, I read research papers by Dr Ho and Dr Nasim Mavaddat about the PRS that have been created, how they work and what are the issues that have been documented. I also read about logistic regression and principal component analysis as these were relevant to the project.

After that, I was given access to Cancer Research Malaysia’s data to perform my analysis. This included the gene alleles of the participants in the study as well as their ethnicity and whether they had breast cancer. I explored the dataset provided and to generate my own graph of the PRS by ethnicity.

Distribution of PRS by ethnicity

To determine the heterogeneity of the PRS between ethnicities, I used R to calculate the Thompson’s I2I^{2} Statistic and plotted forest plots to visualise the differences.

Forest plot of mean of PRS for each ethnicity

For the remaining time, Dr Ho and I would have weekly check-ins where I would present my progress to Dr Ho and she would give feedback on my analysis as well as decide what direction we should look into. I would spend the week performing the analysis and recalculating the PRS after the algorithm was applied. We tried various methods to reduce the I2I^{2} Statistic but most of them were unsuccessful in producing the desired results except for the addition of principal components in the PRS calculation.

At the end of the 3 months, I wrote a report detailing the background of the problem, methods we tested and the results.

Unable to display PDF file. Download instead.

PDF of the report on Polygenic Risk Scores

Thoughts

Since all the work was done remotely, I had time to both enjoy my summer as well as continue my learning journey. I appreciate Dr Ho for giving me this opportunity and guidance to improve my statistical and programming knowledge. My R programming skills greatly improved over the course of the project as well as my communication skills as I learned to convey my results through clear visuals and presentation. I will continue to use these valuable skills in my future career.