Description: Next-generation sequencing technologies have generated a massive amount of DNA, RNA, and protein sequences since their inception. However, data privacy policies often restrict sharing such data for the risk of re-identifying individuals from whom the sequences were generated. Even when all the data from a sequencing experiment is available, it is often insufficient for statistical power or training machine learning models. Despite the lack of data, sometimes the data sets are ironically too large to realistically share with researchers. In this thesis, I explore methods to overcome challenges of data privacy and data gravity in bioinformatics research.
 

In collaboration with QIMR Berghofer and the Riken Center for Integrative Medical Sciences, we used federated methods to analyze genomic data from the BioBank Japan in situ to classify variants of uncertain significance while preserving privacy. With the Department of Laboratory Medicine and Pathology at the University of Washington, we developed a statistical model that demonstrates using responsibly shared clinical evidence alone can classify variants of uncertain significance which occur at the rate of 1 in 100,000 people within just a few years. With researchers from McGill University, we reviewed the state of the art in federated computing technologies and how well they satisfy the privacy restrictions from the General Data Protection Regulation. With researchers from NASA, Amazon, and Intel, we developed a federated learning framework to run between terrestrial and space-borne compute infrastructure, laying the groundwork for subsequent experiments, which preclude the need to transfer large datasets across astronomical distances. Finally, at NASA, we used a causal inference machine learning ensemble to infer robust correlation between mouse liver gene expression and a corresponding lipid density phenotype in space-flown mice.

Event Host: James Casaletto, Ph.D. Candidate, Biomolecular Engineering & Bioinformatics

Advisor: Benedict Paten

Join us in person or on Zoom: https://ucsc.zoom.us/j/91562764120?pwd=RTl3V0ZPVndPeDNxcG1WQW1iUnI4QT09
Passcode: 190302

Event Details

See Who Is Interested

0 people are interested in this event

User Activity

No recent activity