Abstract
Previous generations of face recognition algorithms show differences in accuracy for faces of different races (race bias) (O’Toole et al., 1991; Furl et al., 2002; Givens et al., 2004; Phillips et al., 2011; Klare et al., 2012). Whether newer deep convolutional neural networks (DCNNs) are also race biased is less well studied (El Khiyari et al., 2016; Krishnapriya et al., 2019). Here we present methodological considerations for measuring underlying race bias. We consider two key factors: data-driven and scenario modeling. Data-driven factors are driven by the data itself (e.g., the architecture of the algorithm, image quality, image population statistics). Scenario modeling considers the role of the “user” of the algorithm (e.g., threshold decisions and demographic constraints). To illustrate these issues in practice, we tested four face recognition algorithms: one pre-DCNN (A2011; Phillips et al., 2011) and three DCNNs (A2015; Parkhi et al., 2015), (A2017b; Ranjan et al., 2017), (A2019; Ranjan et al., 2019) on East Asian and Caucasian faces. First, for all four algorithms, the degree of race bias varied as a function of the identification decision threshold. Second, for all algorithms, to achieve equal false accept rates (FARs), Asian faces required higher identification thresholds than Caucasian faces. Third, dataset difficulty affected both overall recognition accuracy and race bias. Fourth, demographic constraints on the formulation of the distributions used in the test, impacted estimates of algorithm accuracy. We conclude with a recommended checklist for measuring race bias in face recognition algorithms.