Bioinformatics and Biostatistics

Professor and University Scholar
Room No. 135, 485 E. Gray St.
Louisville, KY 40202
Phone: 502-852-0081
Fax: 502-852-3294

Full CV



Ph.D. Statistics, 1995, University of Georgia, Athens, USA.
Dissertation Title: Dynamics of Cytonuclear Disequilibria and Related Statistical Tests for The Neutrality of Mitochondrial DNA markers for Hybrid Zone Data (under the direction of Prof. Jonathan Arnold, Department of Genetics, University of Georgia, Athens)

M.S. Statistics, University of Georgia, Athens, USA.

B.S. Physics major, University of Calcutta, India.

Research Interests

Bioinformatics, Proteomics, Infectious Disease Modeling, Inference, Statistical Genetics, Statistical issues in Population Biology, Survival Analysis and Multistate models.

Honors and Awards

. I am an elected member of the International Statistical Institute (ISI).

. I am the President elect for the Caucus for women in Statistics 2013-14.

. My biography is included in the 2010 (64 th Edition) of Marquis Who's Who inAmerica, and Who’s Who in the world, in Science and Engineering and Who’s Who in women.

I have Appeared in Fox News Atlanta, May 2004.

I am the Co-recipient of the CURO Excellence in Undergraduate Research Mentoring Award from University of Georgia, April 2002.

Press coverage (Atlanta Business Chronicle), April, 2002.

Press coverage (Georgia State Magazine), Fall, 2002.

Featured Research faculty in College of Arts and Sciences, Feb., 2003.

Phi Kappa Phi honor society, April 2000.

Outstanding Junior Faculty Award nomination, Georgia State University, Atlanta, Georgia, April 2000.

Student paper award in SRCOS/ASA summer conference, Melbourne, Florida, June 1995.

Best Theoretical Student Award, Department of Statistics, University of Georgia, Athens, Georgia, 1994.

Outside Interests

Singing and playing an instrument called Harmonium,  Traveling, Recognizing and practicing acts of kindness.


Selected Publications

Book Chapters
1. Datta, S. and Arnold, J. (2002). Some comparisons of clustering and classification techniques applied to transcriptional profiling data. In Advances in Statistics, Combinatorics and Related Areas, Eds.: C. Gulati, Y-X. Lin, S. Mishra, and J. Rayner, World Scientific, 63-74.

2. Datta, S. and Pihur, V. (2009). Feature selection and machine learning with mass spectrometry data, R. Matthiesen, ed., In Clinical Proteomics: Methods, Applications and Tools, Humana Press, (Matthiesen, R. ed.), pp. 205-229.

Journal Articles
1.    Datta, S., Fu, Y. X., Arnold, J. (1996). Dynamics and equilibrium behavior of cytonuclear disequilibria under genetic drift, mutation, and migration, Theoretical Population Biology, 50, 298-324.

2. Datta, S. and Arnold, J. (1996). Diagnostics and a statistical test of neutrality hypothesis using the dynamics of cytonuclear disequilibria, Biometrics, 52, 1042-1054.

3. Datta, S., Longini, I. M., and Halloran, E. (1997). Measuring vaccine efficacy for different HIV vaccine trials, Statistics in Medicine, 17, 185-200.

4. Datta, S., Halloran, E. M. and Longini, I. M. (1999). Efficiency of estimating vaccine efficacy for susceptibility and infectiousness: randomization by individual versus household, Biometrics, 55, 792-798.

5. Datta, S., Satten, G. A. and Datta, S. (2000). Nonparametric estimation for the three stage irreversible illness-death model, Biometrics, 56, 841-847.

6. Datta, S. and Datta, S. (2003) Comparisons and validation of statistical clustering techniques for microarray gene expression data,  Bioinformatics, 19,  459-466.

7. G., Brehm, S., Datta, S., and Adams, M. W. W. (2003). Whole Genome DNA microarray of a hyperthermophile and an archaeon: Pyrococcus furious grown on peptides and carbohydrate, Journal of Bacteriology, 185, 3935-3947.

8. Datta, S.,  Satten, G. A., Benos, D. J., Xia, J.,  Heslin, M., and Datta, S. (2004). An empirical Bayes adjustment to increase the sensitivity of detecting differentially expressed genes in microarray experiments, Bioinformatics, 20, 235-242.

9. Warrenfeltz, Z., Pavlik, S., Datta, S., Kraemer, E., Benedict, B. Mcdonald, J. F. (2004).  Gene expression profiling of epithelial ovarian tumors corelated with malignant potential.  Molecular Cancer, 2004, 3:27.

10. Datta, S. and Datta, S. (2005). Empirical Bayes screening (EBS) of many p-values with applications to microarray studies, Bioinformatics, 21, 1987-1994.

11. Datta, S. (2005). Statistics in Microarray Analysis, In Encyclopedia of Statistical Sciences, Second edition, Wiley, New York.

12. Datta, S. and Datta, S. (2006).  Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes, BMC Bioinformatics, 7, 397. (Highly Accessed)

13. Datta, S., Le-Rademacher, J. and Datta, S. (2007). Predicting patient survival from microarray data by accelerated failure time modeling using partial least squares and LASSO, Biometrics, 63, 259-271.

14. Pihur, V., Datta, S. and Datta, S. (2007). Weighted rank aggregation of cluster validation measures: A Monte Carlo cross-entropy approach. Bioinformatics,  23, 1607-1615.

15. Pihur, V., Datta, S. and Datta, S. (2008). Finding cancer genes through meta-analysis of microarray experiments: Rank aggregation via the cross entropy algorithm. Genomics, 92, 400-403.

16. Pihur, V., Datta, S. and Datta, S. (2008). Reconstruction of genetic association networks from microarray data: A partial least squares approach. Bioinformatics, 24, 561-568.

17. Datta, S., Turner, D., Singh, R., Ruset, B., Pierce, W. M., and Knudsen, T. B. (2008). Fetal alcohol syndrome in mice detected through proteomics screening of the amniotic fluid. Birth Defects Research Part A: Clinical and Molecular Teratology, 82, 177-186.

18. Yoo, J. K., Becky S. Patterson, B. S. and Datta, S. (2009). OLS-based predictor test in single index model to predict transcription rate by histone acetylation level, Statistics & Probability Letters, 79: 20, 2109-2114.

19. Atlas, M. and Datta, S. Monoisotopic Peak Detection for Mass spectrometry Data (2009). Journal of Proteomics and Bioinformatics, 2.5, 202-216.

20. Datta, S, Pihur, V. and Datta, S. (2010). An adaptive optimal ensemble classifier via bagging and rank aggregation with applications to high dimensional data, BMC Bioinformatics, 11:427.

21. Gill, R., Datta, S. and Datta, S. (2010). A statistical framework for differential network analysis  from microarray data using partial least squares, BMC Bioinformatics,11:95. PMC2838870

22. Ndukum, J., Fonseca, L, L., Santos, H., Voit, E., O. And Datta, S. (2011) Statistical Inference Methods for Sparse Biological Time Series Data, BMC Systems Biology, 2011,5:57. PMC3114728.

23. Li, X, Gill, R., Cooper, N.,G.,F., Yoo, J., K., and Datta, S. (2011) Modeling microRNA-mRNA Interactions Using PLS Regression in Human Colon Cancer, BMC Medical Genomics 2011, 4:44. PMC3123543 (Highly accessed).

24. Ndukum, J., Atlas, M., Datta, S. (2011) pkDACLASS: open source software for analyzing MALDI-TOF, Bioinformation. 2011 Mar 2;6(1):45-7. PMC3064853.

25. Manavalan, T. T., Teng, Y., Bhimani, S., Appana, S. N., Datta, S., Kalbfleisch, T. S., Li, Y., and Klinge, C. M. (2011). Differential expression of microRNA expression in tamoxifen-sensitive MCF-7 versus tamoxifen-resistant LY2 human breast cancer cells, Accepted, Cancer Letters.

