Below is a brief (non-comprehensive) list of representative publications from our faculty and research staff. To see additional publications for individual department members, browse to our people page.


Barnes CN and Rai SN. An exact method for link parameter estimation in error benchmarking: an application to Phase II two-stage single arm oncology trials. Statistical Methods in Medical Research, 20(5):523-529, 2011. PMID: 20696674.

Boone SD, Baumgartner KB, Baumgartner RN, Connor AE, Pinkston CM, Rai SN, Riley EC, Hines LM, Giuliano AR, John EM, Stern MC, Torres-Mejia G, Wolff RK, Slattery ML. Associations between CYP19A1 polymorphisms, Native American ancestry, and breast cancer risk and mortality: the Breast Cancer Health Disparities Study. Cancer Causes Control. Nov 2014;25(11):1461-1471. DOI: 10.1007/s10552-014-0448-5.

Chakraborty, S., Datta, S. and Datta, S. Surrogate variable analysis using partial least squares (SVA‐PLS) in gene expression studies. Bioinformatics, 28, 799‐806 (2012).PMID: 22238271

Connor AE, Baumgartner RN, Baumgartner KB, Pinkston CM, John EM, Torres-Mejia G, Hines LM, Giuliano AR, Wolff RK, Slattery ML. Epidermal growth factor receptor (EGFR) polymorphisms and breast cancer among Hispanic and non-Hispanic white women: the Breast Cancer Health Disparities Study.Int J Mol Epidemiol Genet. 2013;4(4):235-249.

Connor AE, Baumgartner RN, Pinkston C, Baumgartner KB. Obesity and risk of breast cancer mortality in Hispanic and Non-Hispanic white women: the New Mexico Women's Health Study.J Womens Health (Larchmt). Apr 2013;22(4):368-377. DOI: 10.1089/jwh.2012.4191

Daniels, M.J., and Gaskins, J.T.. (2013) Bayesian methods for the analysis of mixed categorical and continuous (incomplete) data. In Analysis of Mixed Data: Methods and Applications (edited by A.R. de Leon and K. Carriere Chough). pp. 189-208. Chapman & Hall/CRC Press.

Datta, S., Lorenz, D., and Datta, S. Approximate U‐statistics for state waiting times under right censoring. In Modern Multivariate and Robust Methods (K. Nordhausen, S. Taskinen, Eds.), Springer(2015).

Datta, S., Nevalainen, J. and Oja, H. A general class of signed rank tests for clustered data when the cluster size is potentially informative. Journal of Nonparametric Statistics, 24, 797‐808 (2012). PMCID: PMC3467023

Datta, S, Pihur, V. and Datta, S. An adaptive optimal ensemble classifier via bagging and rank aggregation with applications to high dimensional data, BMC Bioinformatics, 11, 427 (2010). PMCID: PMC2933716

Datta S., (2013) Feature selection and machine learning with mass spectrometry data. Methods Mol Biol., Springer. 2013;1007:237-62. doi: 10.1007/978-1-62703-392-3_10

Datta, S. and Gill, R. (2014) Statistical Analysis of DNA Microarray Data Revision for publication in Wiley StatsRef-Statistics Reference Online.

Fan, J. and Datta, S. Fitting accelerated failure time models to clustered survival data with potentially informative cluster size. Computational Statistics & Data Analysis, 55, 3295‐3303 (2011).

Gaskins, J.T., and M.J. Daniels. (2015) Covariance partition priors: A Bayesian approach to simultaneous covariance estimation for longitudinal data. Journal of Computational and Graphical Statistics. Forthcoming.

Gaskins, J.T., and M.J. Daniels. (2013) A nonparametric prior for simultaneous covariance estimation. Biometrika, 100(1): 111-124.

Gaskins, J.T., M.J. Daniels, and B.H. Marcus. (2014) Sparsity inducing prior distributions for correlation matrices of longitudinal data. Journal of Computational and Graphical Statistics. 23(4):966-984.

Gill, R., Datta, S. and Datta, S. A statistical framework for differential network analysis from microarray data using partial least squares, BMC Bioinformatics, 11, 95 (2010). PMCID: PMC2838870

Kong M, Xu S, Levy S, and Datta S (2015). GEE type inference for clustered zero-inflated negative binomial regression with application to dental caries. Computational Statistics and Data Analysis 85, 54-66. [PMID: 25620827]

Kong M, Yan J (2011). Modeling and testing treated tumor growth using cubic smoothing splines. Biometrical Journal 53, 595-613. [PMID 21604288]

Kong M, Lee JJ (2008). A semiparametric model for assessing drug interaction. Biometrics 64, 396-405. [PMID: 17900314]

Kulasekera K.B. (1995). Comparison of Regression Curves using Quasi Residuals. Journal of the American Statistical Association 90: 1085-1094.

Kulasekera K.B. and Wang J. (1997). Smoothing Parameter Selection for Power Optimality in Testing of Regression Curves. Journal of the American Statistical Association (92): 500-511.

Kuruwita C., Kulasekera K.B., and Gallagher C. (2011). Varying Coefficient Model with Unknown Link. Biometrika, Vol 98: 701-710.

Li, X, Gill, R., Cooper, N., G., F., Yoo, J., K., and Datta, S. (2011). Modeling microRNAmRNA Interactions Using PLS Regression in Human Colon Cancer, BMC Medical Genomics, 4, 44. PMC3123543

Lin, W. and Kulasekera K.B. (2007). Uniqueness of a Single Index Model. Biometrika, Vol 94: 496-501.

Lorenz, D.J., Levy, S., Datta, S. Inferring marginal association with paired and unpaired clustered data. Statistical
Methods in Medical Research. 2016 Sep 20. pii: 0962280216669184. DOI: 10.1177/0962280216669184.
PMID: 27655806.

Lorenz DJ, Gill RS, Mitra R, Datta S. Using RNA-seq Data to Detect Differentially Expressed Genes. In: Statistical Analysis of Next Generation Sequence Data, Springer-Verlag. 2014.

Lorenz DJ, Datta S. A nonparametric analysis of waiting times from a multistate model using a novel linear hazards model approach. Electronic Journal of Statistics. 2015; 9: 419-443. DOI: 10.1214/15-EJS1003.

Lorenz DJ, Datta S. Comparing waiting times in a multi-stage model: A log-rank approach. Journal of Statistical Planning and Inference. 2012; 142(10): 2832-2843. DOI: 10.1016/j.jspi.2012.04.003.

Lorenz DJ, Datta S, Harkema SJ. Marginal association measures for clustered data. Statistics in Medicine. 30 Nov. 2011; 30(27). DOI: 10.1002/sim.4368.

Mostajabi, F.‡ and Datta, S. Nonparametric regression of state occupation, entry, exit and waiting times with multistate right censored data. Statistics in Medicine, 32, 3006‐3019 (2013). PMID: 23225570

Pierce MC, Kaczor K, Aldridge S, O’Flynn J, Lorenz D. Bruising Characteristics Discriminate Physical Child Abuse From Accidental Trauma in Young Children. Pediatrics. 2010 Jan; 125(1):67-74.

Pinkston CM, Baumgartner RN, Connor AE, Boone SD, Baumgartner KB. Physical activity and survival among Hispanic and non-Hispanic white long-term breast cancer survivors and population-based controls. J Cancer Surviv. Mar 5 2015. 10.1007/s11764-015-0441-3

Rai SN, Pan J, Yuan X, Sun J, Hudson MM and Srivastava DK. Estimating Incidence Rate on Current Status Data with Application to a Phase IV Cancer Trial, Communications – Theory and Methods. 42(17):2417-2433, 2013.

Rai SN, Ray HE, Pan J, Barnes C, Cambon AC, Wu X, Bonassi S, and Srivastava DK. Phase II Clinical Trials: Issues and Practices. 1(2):1-3, 2014.

Rai SN, Ray HE, Yuan X, Pan J, Hamid T, Prabhu SD. Statistical Analysis of Repeated MicroRNA High Throughput Data with Application to Human Heart Failure: A Review of Methodology. Open Access Medical Statistics, 2:21-31, 2012. PMID: 24738042 and PMCID: PMC3984897

Ray HE and Rai SN. Flexible Bivariate Phase II Clinical Trial Design Incorporating Toxicity and Response on Different Schedules. Statistics in Medicine. 32(3):470-485, 2013. PMID: 23147373

Shen, Y., Wu, D., and Zelen, M. (2001). Testing the independence of two diagnostic tests. Biometrics. Vol. 57, No. 4, 1009-1017.

Stolzenberg-Solomon RZ, Falk RT, Stanczyk F, Hoover RN, Appel LJ, Ard JD, Batch BC, Coughlin J, Han X, Lien LF, Pinkston CM, Svetkey LP, Katki HA. Sex hormone changes during weight loss and maintenance in overweight and obese postmenopausal African-American and non-African-American women.Breast Cancer Res. 2012;14(5):R141. DOI: 10.1186/bcr3346

Wan Y, Datta S, Conklin D, Kong M (2015). Variable selection models based on multiple imputation with an application for predicting median effective dose and maximum effect. Journal of Statistical Computation and Simulation. 85(9), 1902-1916.

Wang D, Gallagher C, McMahan, and Kulasekera K.B. (2014). Semiparametric group testing regression models, Biometrika 101(3): 587-598.

Wang M, Kong M, Datta S (2011). Inference for marginal linear models with clustered longitudinal data for potentially informative cluster sizes. Statistical Methods in Medical Research20, 347-367. [PMID: 20223781]

Wu, D., Kafadar K, Rosner GL (2014). Inference of long term effects and over-diagnosis in periodic cancer screening. Statistica Sinica. 2014; 24: 815-831.

Wu, D., Kafadar K, Rosner GL, Broemeling LD (2012). The lead time distribution when lifetime is subject to competing risks in cancer screening. The International Journal of Biostatistics. Vol. 8(1), Article 6. DOI: 10.1515/1557-4679.1363.

Wu, D., Rosner GL, and Broemeling LD (2007). Bayesian inference for the lead time in periodic cancer screening. Biometrics. Vol.63, 873-880.

Wu, D., Rosner GL, and Broemeling LD (2005). MLE and Bayesian inference of age-dependent sensitivity and transition probability in periodic screening. Biometrics, Vol.61, No.4, 1056-1063.

Zheng, Q., Gallagher, C., and Kulasekera, K.B. (2013) Adaptively weighted kernel regression, Journal of Nonparametric Statistics, 25 (4), 855-872.

Zheng, Q., Kulasekera, K.B., and Gallagher, C. (2013) Adaptive penalized quantile regression for high dimensional data, Journal of Statistical Planning and Inference, 142 (6), 1029-1038.

Zheng, Q., Gallagher, C., and Kulasekera, K.B. (2013) The growth rate of significant regressors for high dimensional data, Statistics & Probability Letters, 83 (9), 1969-1972.

Zheng, Q., Kulasekera, K.B., and Gallagher, C. (2010) Local adaptive smoothing in kernel regression estimation, Statistics & Probability Letters, 80 (7-8), 540-547.

Stay connected TwitterFacebookLinkedInYouTubeInstagram