A Comparison of Scales for Recording and Evaluating Dental Arcade Shape

Christopher Maier
Kelly Heim-Maier
Erin McCoy


Recording scales to evaluate the shape of the dental arcade have been proposed; however, no comparison has been made among different scales to assess which is best. Here, a comparison is made among several scales (Gill 1971, 1995; Gill & Rhine 1986; Hefner & Linde 2018; Hooton, The Harvard Blanks n.d.; Maier 2017; Maier et al. 2015), and they were evaluated on (1) low observer error and (2) strong association with groups. Digital photographs of 659 individuals from collections across the United States were assessed for dental arcade shape. These data were generated by three observers to test for replicability. Additionally, the relationship between scale and sample groups was evaluated using a combination of chi-squares and several measures of effect size (Cramér’s V, Sakoda’s C, Goodman–Kruskal lambda). Values for Fleiss’s kappa range from “fair” to “almost perfect” between intra-and interobserver measures (κ = 0.212–0.851). Nearly all scales exhibit significant associations with the sample groups, though the general trend is toward weak effect sizes. All values for Cramér’s V and Sakoda’s C fall below 0.3, and the lambda statistic does not exceed an average reduction of error of 6%. The Gill scale is the most reliably recorded but is tied to typological approaches to human variation. A five-point scale proposed by Maier (2017) is less replicable but has the largest effect sizes—“moderate” compared to “weak.” Recording the angle of the sides of the dental arcade may be as informative as several of these scales and avoids many typological associations.

