Purchase this article with an account.
Asal Baragchizadeh, Kimberley D. Orsten-Hooge, Thomas P. Karnowski, David S. Bolme, Regina Ferrell, Parisa R. Jesudasen, Carlos D. Castillo, Alice J. O’Toole; Seeing Through De-Identified Faces in Videos by Humans and a Deep Convolutional Neural Network. Journal of Vision 2020;20(11):757. doi: https://doi.org/10.1167/jov.20.11.757.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
The increasing use of cameras in public spaces raises privacy concerns that have spurred the development of face de-identification methods. These methods are aimed at obscuring identity, while preserving facial actions. We evaluated the performance of eight face de-identification algorithms in naturalistic driving data. Humans and a pre-trained, high-performing deep convolutional neural network (DCNN) (Ranjan et al., 2018) were tested on their ability to “see through” the identity-masking methods. De-identification algorithms included a personalized supervised bilinear regression method for facial action transfer, a generic avatar face, four edge-detection methods, and two combined masking approaches. In an old/new recognition experiment, humans (n = 160) learned driver identities from high-resolution images and were tested with drivers in low-resolution videos. Faces in the videos were either intact or masked with one of the algorithms. Identification accuracy was lower in masked conditions (p < .001); (intact-face, d′ = 1.11; minimums: Canny edge-detection and avatar, d′ = .17 and .16, respectively). Subjects exhibited conservative decision bias for all videos (maximum: avatar, C = .74; minimum: Canny-inverted edge-detection, C = .18). Next, the DCNN was tested with the high-resolution images and frames extracted from videos. The output of the penultimate layer of the network served as a driver’s face representation. For each video, we created a driver identity template by averaging DCNN face representations across frames. Identification was measured as the cosine between the face representation vectors. Performance was tested between high-resolution images and each of the masked and unmasked video templates. The DCNN performed surprisingly well across conditions, given the challenging viewing conditions. Across conditions, there was general accord in the pattern of performance for humans and machines, with the best (unmasked) and worst (Canny edge-detection and avatar) conditions aligning. We propose that humans and machines should both be utilized for evaluating de-identification methods.
This PDF is available to Subscribers Only