Missing Data Imputation Using Morphoscopic Traits and Their Performance in the Estimation of Ancestry

Main Article Content

Michael Kenyhercz
Nicholas Vere Passalacqua
Joseph T. Hefner

Abstract

Missing data are an inherent problem in biological anthropology for both reference data sets and individual cases. The goal of data imputation for forensic anthropological applications is to accurately estimate missing values by using other, observed values. To quantify the accuracy of macromorphoscopic data in conditions with slight (10%), moderate (25%), and severe (50%, 75%, and 90%) amounts of missing data, we selected four data-imputation techniques: Hot Deck, iterative robust model-based imputation (IRMI), k-nearest neighbor (k-NN), and the variable medians. Hefner’s Macromorphoscopic Databank was used (Hefner 2018); the full sample consisted of 688 individuals from 3 U.S. populations (Blacks, Hispanics, and Whites). Six cranial macromorphoscopic variants were scored in accordance with Hefner (2009). The five data sets with missing data were randomly simulated over multiple iterations (N = 500 each) from the original data. These data sets were compared for agreement using weighted Cohen’s kappa and correct classification accuracies over multiple iterations (N = 500) calculated for the original data set. The latter comparisons were also used to examine the effects of imputed data on classification accuracies. Results suggest that IRMI is the most accurate method for imputing missing data, followed by k-NN, in each of the comparisons for nearly all of the variables imputed.

Article Details

Section
Technical Notes