PhD Thesis

Silander Lab, Massey University, Auckland, New Zealand & Jones Lab, Genome Sciences Centre, Vancouver, BC, Canada September 2018 – August 2022

Machine Learning Methods for Colorectal Cancer Dataset Analysis

  • Developed a rank based permutation approach to assigning p-values to features and feature sets from importance scores in random forest models
  • Created the R package Rf2pval in order to make the aforementioned statistical method user-friendly, easy to implement and to visualize results
  • Created an RShiny app for visualizing colorectal microbiome dataset results from a colleague’s high throughput microbial data analysis pipeline (MetaFunc)
  • Developed a multi-language bioinformatics pipeline for extensive machine-learning model development large RNA-seq datasets using a high performance cluster computing system
  • Performed a feasibility study on how to combine RNA-seq genes and Microbial human unmapped read data in a random forest model.
  • Analyzed a novel colorectal cancer RNA-seq dataset using machine learning approaches
  • Trained, tested & validated on an independent dataset 3 random forest models for genes, genes + microbes, and microbes alone, which can differentiate CRC anatomical side with 80 to 90% accuracy
  • Associated novel and known biomarkers discovered in the random forest models with either right or left-sided colorectal cancers

Honours Thesis

Bieda Lab, University of Calgary September 2014 – April 2015

Advanced Computational Approaches to Omics Data Set Analyses

  • Created advanced R programs for combining Gene Expression Microarray analysis data with ChIP-seq analysis data for pharmacogenomic applications
  • Datamined large data sets obtained from NCBI GEO/SRA
  • Created scripts to automate graphical display production of downstream ChIP-seq analysis and gene expression microarray analysis results
  • Integrated gene expression analysis with a visual pathway analysis by writing R programs that color-code genes within KEGG pathway diagrams based on their expression levels