Daniel Manrique-Vallier, Ph.D., Department of Statistics, Indiana University

"Simultaneous Edit and Imputation for Categorical Microdata"

Statistical agencies and other organizations that collect and process data are often faced with computer data files that contain faulty values. When these errors result in inconsistent records---like pregnant men or married toddlers---agencies usually correct them through a process known as edit-imputation. The dominant paradigm for edit-imputation, due to Fellegi and Holt (1976), separates the task into an error localization and an imputation phase, and is based on finding the minimal set of changes needed for the records to not be inconsistent. While this approach has the advantage of minimizing the changes to the original data, it has the disadvantage of ignoring the distribution of the data during the error localization, and thus producing biased imputations. It also ignores the uncertainty associated with the error-localization procedure. In this talk I introduce an alternative procedure for edit-imputation of categorical data based on joint modeling. This model includes a flexible representation for the underlying true values, with support only on the consistent responses; a model for the location of errors; and a model for the observed faulty data. Estimation is performed simultaneously using MCMC sampling. Through challenging data-based simulations I show how this method can deliver far superior results than those obtained from the application of the Fellegi-Holt approach.

Stay connected TwitterFacebookLinkedInYouTubeInstagram