Missing Data Imputation Using Morphoscopic Traits and Their Performance in the Estimation of Ancestry

Michael Kenyhercz; Nicholas Vere Passalacqua; Joseph T. Hefner

doi:10.5744/fa.2019.1015

PDF (USD 25) HTML (USD 25)

Published: Nov 20, 2019

DOI: https://doi.org/10.5744/fa.2019.1015

Keywords:

forensic anthropology, missing data imputation, macromorphoscopic traits, nonmetric data, classification

Michael Kenyhercz

Department of Defense POW/MIA Accounting Agency, Central Identification Laboratory, Joint Base Pearl Harbor–Hickam University of Pretoria, Anatomy

Nicholas Vere Passalacqua

Anthropology and Sociology, Western Carolina University

Joseph T. Hefner

Department of Anthropology, Michigan State University

Abstract

Missing data are an inherent problem in biological anthropology for both reference data sets and individual cases. The goal of data imputation for forensic anthropological applications is to accurately estimate missing values by using other, observed values. To quantify the accuracy of macromorphoscopic data in conditions with slight (10%), moderate (25%), and severe (50%, 75%, and 90%) amounts of missing data, we selected four data-imputation techniques: Hot Deck, iterative robust model-based imputation (IRMI), k-nearest neighbor (k-NN), and the variable medians. Hefner’s Macromorphoscopic Databank was used (Hefner 2018); the full sample consisted of 688 individuals from 3 U.S. populations (Blacks, Hispanics, and Whites). Six cranial macromorphoscopic variants were scored in accordance with Hefner (2009). The five data sets with missing data were randomly simulated over multiple iterations (N = 500 each) from the original data. These data sets were compared for agreement using weighted Cohen’s kappa and correct classification accuracies over multiple iterations (N = 500) calculated for the original data set. The latter comparisons were also used to examine the effects of imputed data on classification accuracies. Results suggest that IRMI is the most accurate method for imputing missing data, followed by k-NN, in each of the comparisons for nearly all of the variables imputed.

Issue

Vol. 2 No. 3 (2019)

Section

Technical Notes

Article Sidebar

Main Article Content

Abstract

Article Details