Combining Continuous and Categorical Data Modeling in Developmental Age Estimation Using Hierarchical Bayes
Main Article Content
Abstract
Residual correlations (correlations that persist after accounting for the effect of chronological age) between variables can have a significant impact on final age estimates. Such correlations can result in overly narrow age intervals and high error rates when not accounted for. Modeling correlations can be mathematically problematic across mixed data types. Hierarchical modeling can incorporate continuous and categorical traits into a single model that accounts for correlated variables while reducing computationally expensive calculations. This paper demonstrates a Bayesian hierarchical modeling approach in which trait variables were grouped by data type or bodily system and used to produce separate age estimates with any appropriate model. These age estimates were combined into a single estimate using a multivariate normal model via nested cross-validation. The data used included nine diaphyseal length measurements and 29 epiphyseal fusion and ossification sites from 179 individuals in the publicly available U.S. Subadult Virtual Anthropology Database. Diaphyseal ages were modeled with linear regression and epiphyseal ages with random forest regression. Age estimates from the hierarchical model had reduced bias relative to diaphyseal or epiphyseal maximum likelihood estimates alone. Combined-indicator age intervals from 95% highest density regions (HDRs) were on average 15% narrower than those from diaphyseal 95% HDRs while success rates were 2% lower (91% vs. 93%). Functional example code is provided. A general hierarchical modeling approach may be applicable to other areas of skeletal analysis that employ correlated variables of mixed data types including adult age estimation and ancestry estimation.