Michael Sekula, Ph.D., Department of Bioinformatics and Biostatistics, University of Louisville

"Detection of differentially expressed genes in discrete single-cell RNA sequencing data using a hurdle model with correlated random effects"

Single-cell RNA sequencing (scRNA-seq) technologies are revolutionary tools allowing researchers to examine gene expression at the level of a single cell. Traditionally, transcriptomic data have been analyzed from bulk samples, masking the heterogeneity now seen across individual cells. Even within the same cellular population, genes can be highly expressed in some cells but not expressed (or lowly expressed) in others. Therefore, the computational approaches used to analyze bulk RNA sequencing data are not appropriate for the analysis of scRNA-seq data. Here, we present a novel statistical model for high dimensional and zero-inflated scRNA-seq count data to identify differentially expressed (DE) genes across cell types. Correlated random effects are employed based on an initial clustering of cells to capture the cell-to-cell variability within treatment groups. Moreover, this model is flexible and can be easily adapted to an independent random effect structure if needed. We apply our proposed methodology to both simulated and real data and compare results to other popular methods designed for detecting DE genes. Due to the hurdle model's ability to detect differences in the proportion of cells expressed and the average expression level (among the expressed cells), our methods naturally identify some genes as DE that other methods do not, and we demonstrate with real data that these uniquely detected genes are associated with similar biological processes and functions.

Stay connected TwitterFacebook LinkedIn YouTubeInstagram