2017-02-24

Subhajit Sengupta, PhD, Pritzker School of Medicine, University of Chicago

"Novel Computational approaches for some key problems involving large-scale biological data"

In this talk, I will outline my experience of working on some challenging problems in computational biology. As a data scientist trained in computer science and mathematical statistics, I had exposure to diverse research themes in the past, ranging from computational neuroscience to image processing. However, my work in recent years has been centered around Next-generation sequencing (NGS) data and matrix-valued directional data. This talk will mainly focus on a few models for tumor heterogeneity, which emphasizes the importance of a rigorous, multi-disciplinary approach to large-scale biological data analysis. Most tumors are heterogeneous consisting of multiple subclones with unique genomes - a phenomenon called intra-tumor heterogeneity (ITH). Reconstructing such genetically divergent subpopulations of cells from NGS data is one of the major challenges in precision medicine. We construct an end-to-end dockerized pipeline for subclonal reconstruction for tumor samples. Based on NGS data and Bayesian feature allocation models, we estimate tumor purity, the proportion of subclones, subclonal copy numbers etc for a single sample as well as multiple samples. Based on this result, a phylogenetic tree is constructed to show the relationship among subclones. Next, we incorporate a novel computational tool LocHap, which is used to find local haplotypes mapped by paired-end short reads. This is further used to construct the subclonal architecture based on the mutation pairs. I will conclude the talk by mentioning a few other collaborative projects that I worked on.

Stay connected TwitterFacebookLinkedInYouTubeInstagram