Main Article Content
Morphological skull and pubic traits are routinely used in forensic anthropological casework for sex estimation, with the Walker (2008) and Klales et al. (2012) methods commonly employed. This research aims to present a comprehensive evaluation of reliability in skull and pubic trait scoring and the implications of observer scoring variation on sex estimation accuracy. Results from previous studies are summarized and compiled into tables for comparisons. Data from this study are composed of a large compilation of Walker (2008) and Kales et al. (2012) trait scores (skull n = 392, pubic n = 443) contributed by seven researchers of varying levels of expertise. Intra- and interobserver analyses were performed on the trait scores, and variations in correct sex classifications were assessed among observers, with particular emphasis on the effects of observer experience. Statistical results indicate that the traits utilized in Walker (2008) and Klales et al. (2012) can be reliably scored except for the mental eminence, which has shown considerable variation among studies and individuals. Resultant sex estimations improved with experience level, with the highest accuracy rates for both methods among experts. Although novice observers had good agreement in trait scores with more experienced observers, minor scoring differences negatively impacted classification accuracy, particularly in the Klales et al. (2012) method with more than a 15% drop in accuracy. The results highlight the importance of experience and exposure to human variation and training in these sex estimation methods and also suggests data from expert practitioners can be combined into larger databases.