Abstract
Detection of frontal faces in natural images is a well-studied problem in computer vision, with open-source and commercial solutions that achieve a high degree of accuracy. Biologically-inspired solutions to the problem have been developed. This robustness makes face detection an excellent platform for investigating the complexities of direct comparison between human and computer perceptual abilities. Here, we advance our understanding of the problem of human-computer comparison by presenting identical degraded stimuli to both human subjects and a selection of commonly used algorithms. Images are grayscale frontal face images where a face is detected 100% of the time by all detection algorithms and observers. Images are degraded by first applying a fast Fourier transform (FFT), finding mean amplitude per spatial frequency across all images, combining phase information in each image with randomized phase information at a range of coherence levels (from 10% to 90% randomization), then applying the inverse FFT to a frequency-domain signal with the average amplitudes and partially randomized phases. Neither the Haar cascade-based Viola-Jones classifiers nor two highly regarded commercial black-box classifiers assumed to work on a Haar-cascade model were able to detect any faces at coherence levels below 0.8. Human observers consistently detected faces at coherence levels of 0.6 and below. False positive and partial confidence results in human observers were qualitatively different than results with any of the classifiers tested. Importantly, it was impossible to determine whether human observers were using a holistic face detection strategy, a parts-based strategy, or were relying on non-facial image cues. While humans performed better, the uncertainty surrounding the detection method they used indicates that the perceptual function of face detection in humans cannot easily be directly compared to the function performed by computer algorithms, even when a test is devised that presents an identical challenge to both humans and computers.
Meeting abstract presented at VSS 2012