Abstract
In 1970, Masahiro Mori proposed the uncanny valley (UV), a region in a human-likeness continuum where an entity risks eliciting a cold, eerie, repellent feeling. Recent studies have shown that this feeling can be elicited by entities modeled not only on humans but also nonhuman animals. The perceptual and cognitive mechanisms underlying the UV effect are not well understood, although many theories have been proposed to explain them. To test the predictions of nine classes of theories, a within-subjects experiment was conducted with 136 participants. The theories’ predictions were compared with ratings of 10 classes of stimuli on eeriness and coldness indices. One type of theory, configural processing, predicted eight out of nine significant effects. Atypicality, in its extended form, in which the uncanny valley effect is amplified by the stimulus appearing more human, also predicted eight. Threat avoidance predicted seven; atypicality, perceptual mismatch, and mismatch+ predicted six; category+, novelty avoidance, mate selection, and psychopathy avoidance predicted five; and category uncertainty predicted three. Empathy's main prediction was not supported. Given that the number of significant effects predicted depends partly on our choice of hypotheses, a detailed consideration of each result is advised. We do, however, note the methodological value of examining many competing theories in the same experiment.
This study aims to test the predictions of nine classes of UV theories by drawing on their authors’ definitions. To this end, our experiment included stimuli not found in the literature, such as houses and greebles. These design choices were deliberate. Although the term UV is typically applied to three-dimensional objects designed to look human, including robots, computer models, and prostheses, we seek to test whether the predictions of the theories, as defined, hold for other stimuli they were meant to cover. This will help to determine whether the UV effect results from a general perceptual process or a specific one related to anthropomorphism or zoomorphism. If theory definitions were refined in light of our findings, that would contribute to the field.
We designed experimental conditions to test the predictions of the nine classes of UV theories introduced above. The following hypotheses characterize some of the predictions of the theories as we interpret them:
- H1. Thatcher humans elicit a stronger UV effect than humans.
- H2. Thatcher cats elicit a stronger UV effect than cats.
- H3. Thatcher houses elicit a stronger UV effect than houses.
- H4. Thatcherization elicits a stronger UV effect when applied to humans than to cats.
- H5. Thatcherization elicits a stronger UV effect when applied to cats than to houses.
- H6. Faces with distorted proportions elicit a stronger UV effect than undistorted faces.
- H7. Greebles elicit a stronger UV effect than familiar objects like humans, cats, and houses.
- H8. People with a disability resulting in facial dysmorphism elicit a stronger UV effect than people without one.
- H9. Diseased body parts elicit a stronger UV effect than humans.
Configural processing theories predict that the configural processing of a misconfigured exemplar elicits the UV effect (H1–3, 6, 8, and 9) and that the strength of the effect is proportional to the extent of configural processing (H4 and H5). These theories do not predict a UV effect for novel objects (null for H7).
Atypicality theory predicts that deviations from the prototype of an existing category elicit the UV effect (H1–3, 6, 8, and 9). Atypicality, in its general form, operates irrespective of the category. Earlier, we proposed an extended form of atypicality theory, atypicality+, in which the UV effect is the combined effect of atypicality and human likeness (H4 and H5). Atypicality does not predict that exemplars belonging to a novel category elicit the UV effect; hence, greebles would not elicit the effect (null for H7).
Moore's (2012) perceptual mismatch theory predicts that inconsistencies among features, regardless of the stimulus category, elicit the UV effect (H1–3, 6, and 7). Mismatch+ theories additionally predict that the UV effect is the combined effect of feature inconsistency and human likeness (H4 and H5; e.g.
MacDorman & Chattopadhyay, 2016). Greebles as novel objects do not have mismatched features (H8). People with a disability and diseased body parts have mismatched features (H9 and H10). However, they are not mismatched along the human–nonhuman or real–unreal dimensions described by mismatch+ theories (null for H9 and H10).
Category uncertainty theories predict that exemplars straddling category boundaries elicit the UV effect and that those lying within a category do not. A Thatcher human or human with distorted proportions could straddle the human–nonhuman boundary (H1 and H6). The same applies to Thatcher cats and houses (H2 and H3). Jentsch's formulation, category+, predicts a stronger UV effect on the human–nonhuman (H4) and living–inanimate (H5) boundaries. Novel objects are not predicted to produce a UV effect (null for H7).
Novelty avoidance theories predict that exemplars that do not belong to an established category elicit the UV effect (H1–3, 6, and 7), whereas exemplars that belong to an established category do not elicit the UV effect (null for H8 and H9).
Mate selection theory predicts a UV effect for human exemplars only (H1, 4, 6, 8, and 9). The theory does not predict a UV effect for Thatcher cats (H2), Thatcher houses (H3), and greebles (H7).
Psychopathy avoidance theory follows the same pattern as mate selection theory.
Threat avoidance theories predict a UV effect for humans (H1, 6, 8, and 9) and nonhuman animals (H2 and H5) with a stronger effect for humans than nonhuman animals (H4). It does not predict a UV effect for houses (H3) or novel objects (H7).
Empathy, a participant trait, was tested separately:
- H10. The emotional quotient (EQ) predicts the UV effect.
Empathy theories predict that empathy for an inanimate object elicits the UV effect. An indirect consequence of this prediction is that the UV effect should be stronger in individuals with greater empathic abilities (H10).
Participants were recruited from Amazon Mechanical Turk. Inclusion criteria were at least fluent in English, no more than moderately impaired vision with correction, and passing the reverse-scaled items check (i.e. the items must correlate negatively with their unreversed counterpart).
Of 551 initial prospects, 136 participants met the inclusion criteria, consented, and completed the survey (61% men, n = 83). Participants ranged in age from 19 to 73 (median = 35, interquartile range = 29 to 48); 64.0% were White, 30.8% Asian, 9.6% Black or African American, and 5.9% Hispanic; 81.6% resided in the United States, 14.7% in India, 1.5% in Brazil, and 0.7% each in Italy, Mexico, and Pakistan.
In our previous study (
MacDorman & Chattopadhyay, 2016), a 50% reduction in the realism of the whole face increased eeriness,
d = 0.72, and a 50% mean reduction in realism of just the eyes and mouth increased eeriness,
d = 0.26. For an effect size of 0.26, a 1-way repeated measures ANOVA with 10 conditions, 5 stimuli per condition, and 136 participants has a power of 0.90 (λ = 3.25,
df = 1215.00).
The experiment was approved by Indiana University's Office of Research Administration (November 11, 2019, OHRP Category 7, Study No. 1910602465). Informed consent was obtained from all participants. Documentation of informed consent was waived under 45 Code of Federal Regulations (CFR) 46.117(c) or 21 CFR 56.109(c)(1). Human subjects research was performed under the provisions of the Declaration of Helsinki and complied with federal, state, and university standards, policies, and regulations.
The experiment, implemented in Qualtrics as an online survey, was conducted from December 13 to 15 and 22 to 24, 2019. The participant determined the location and time of day.
After giving informed consent, each participant rated 50 images on the 10 scales listed above. Images were presented in random order. The participant then completed the EQ and demographics questionnaires. The experiment's average completion time was 50 minutes.
Maximum likelihood estimation was used to fit a one-way linear mixed-effects model. Planned contrasts were used to compare the differences between the conditions.
All hypotheses were directional (i.e. condition
x > condition
y); therefore, the planned contrasts were one-tailed tests. Because some hypotheses describe nonorthogonal contrasts, the
p values were adjusted for multiplicity. This correction was made by the Westfall method (
Bretz, Hothorn, & Westfall, 2011). Condition had a significant effect on eeriness,
F(9, 1215) = 225.16,
MSE = 249.09,
p < 0.001,
\({\rm{\eta }}_{\rm{p}}^2\) = 0.63.
All hypotheses were supported (
Table 5) except H10 (see below). Thatcher humans, cats, and houses were rated significantly eerier than normal humans, cats, and houses, respectively (H1–3). Thatcherization increased the eeriness of humans significantly more than cats (H4) and cats significantly more than houses (H5). Thus, the effect of Thatcherization increased with human likeness. (This pattern occurred, even though the proportion of the image that was inverted for human stimuli was less than for cat stimuli and still less than for house stimuli.) Human faces with distorted proportions were rated significantly eerier than undistorted faces (H6). Greebles as exemplars of novel objects were rated significantly eerier than normal humans, cats, and houses (H7). People with a disability were rated significantly eerier than people without one (H8). Diseased body parts were rated significantly eerier than humans (H9).
Table 5. Planned contrasts for eeriness and coldness.
Table 5. Planned contrasts for eeriness and coldness.
Empathy theories predict that the UV effect is elicited by empathy for an inanimate object. H10 states that the UV effect increases with empathetic abilities. However, a regression analysis revealed that an individual's EQ was a nonsignificant negative predictor of eeriness, r = –0.07, β = –0.07, t(678) = 3.68, p = 0.055, and explained a nonsignificant portion of the variance, R² = 0.01, adj. R² < 0.01, F(1, 678) = 29.75.
A one-way linear mixed-effects model revealed condition had a significant effect on coldness,
F(9, 1215) = 80.12,
MSE = 187.35,
p < 0.001,
\({\rm{\eta }}_{\rm{p}}^2\) = 0.37. Planned contrasts on coldness revealed the same results as on eeriness.
Table 5 shows that greebles were significantly colder than familiar objects like normal humans, cats, and houses (H7).
A regression analysis revealed that an individual's EQ was a nonsignificant negative predictor of coldness, r = –0.04, β = –0.03, t(678) = –1.09, p = 0.274, and explained a nonsignificant portion of the variance, R² < 0.01, adj. R² < 0.01, F(1, 678) = 1.20.
Table 6 indicates for each theory whether the effect stated in the corresponding hypothesis was predicted and whether it was found.
Configural processing theories predict that the configural processing of a misconfigured exemplar elicits the UV effect and that the effect's strength is proportional to the extent of configural processing. Thatcherization increased the eeriness of humans, cats, and houses (H1–3), as predicted, given that all three are processed configurally. Thatcherization also increased the eeriness of humans more than cats (H4) and cats more than houses (H5), as predicted, given that participants have greater exposure to humans than cats and that humans and cats have less variation in their configural pattern than houses. Faces with distorted proportions, either by artificial manipulation (H6) or because of disability or disease (H8), were also rated eerier, as predicted, as were diseased body parts (H9). However, configural processing failed to predict that novel objects like greebles would be rated eerier than familiar ones like humans, cats, and houses (H7), although greebles were still less eerie than five of the conditions.
Atypicality theories predict deviations from a category prototype elicit the UV effect, either irrespective of the category (atypicality) or proportional to its degree of human likeness (atypicality+). Six conditions deviated from a category prototype, and all six were eerier than their controls (H1–3, 6, 8, and 9). Atypicality+ additionally predicted that human likeness increases the effect of Thatcherization (H4 and H5). However, atypicality failed to predict that greebles would be eerier than familiar objects like humans, cats, and houses, because greebles as novel objects lack an established category prototype from which to deviate (H7).
Perceptual mismatch theories predict that inconsistencies among the features of an exemplar elicit the UV effect. Distorted proportions, which create second-order inconsistencies, increased eeriness as predicted (H6). Thatcherization, which creates inconsistencies between inverted and other features, also increased eeriness as predicted (H1–3). As predicted by mismatch+, human likeness increases the effects of Thatcherization (H4 and H5). However, mismatch+, which focuses on inconsistencies in such dimensions as human likeness and realism, failed to predict effects in perceiving people with a disability (H8) or diseased body parts (H9). Both groups are fully human and fully real. Mismatch, in its general form, predicted these effects. Mismatch and mismatch+ also failed to predict eeriness in novel objects (H7).
Category uncertainty theories predict that the UV effect is elicited by exemplars that straddle a category boundary. Even assuming Thatcher humans, cats, and houses straddled category boundaries, category uncertainty theories failed to predict five significant effects that atypicality+, configural processing, and threat avoidance predicted.
Novelty avoidance theories predict that exemplars not belonging to an established category elicit the UV effect. Greebles as novel objects were rated significantly eerier than familiar objects, a condition consisting of humans, cats, and houses (H7). However, even assuming Thatcherized and distorted exemplars were novel (H1–3 and 6), the theory failed to predict higher eeriness ratings of people with disabilities (H8) and diseased body parts (H9), although both should be established categories. Novelty avoidance also failed to predict the combined effect of Thatcherization and human likeness (H4 and H5).
Mate selection theory predicts that only humans elicit the UV effect because the underlying mechanism evolved to evaluate potential sexual targets. Although mate selection predicted higher eeriness ratings for all hypotheses involving human exemplars (H1, 4, 6, 8, and 9), it failed to predict those involving nonhuman exemplars: Thatcherization increased eeriness in cats and houses (H2 and H3) and increased it more in cats than houses (H5). It also failed to predict that novel objects would be eerier than familiar ones (H10).
Psychopathy avoidance theory makes the same predictions as mate selection theory regarding the hypotheses, with the results following the same pattern.
Threat avoidance theories predict that signs of contagious disease elicit the UV effect and exclude nonanimal stimuli as UV triggers. As predicted, diseased body parts were rated eerier than humans (H9). Assuming Thatcherization of humans and cats, distortion of humans, and disabilities were interpreted as signs of disease and greebles were interpreted as nonanimal, all other predicted effects were significant (H1, 2, 4–6, and 8). However, threat avoidance did not predict that Thatcher houses would be eerier than normal houses (H3) nor that novel objects would be eerier than familiar ones (H7).
Empathy theories predict that empathy for an inanimate object elicits the UV effect. However, the UV effect did not increase with the participant's empathetic abilities (H10).
The experiment tested the predictions of nine different classes of UV theories. Configural processing and atypicality+ predicted eight out of nine significant effects; threat avoidance predicted seven; atypicality, perceptual mismatch, and mismatch+ predicted six; category+, novelty avoidance, mate selection, and psychopathy avoidance predicted five; and category uncertainty predicted three. Having fewer effects undermines the generality of a theory. It does not, however, falsify a theory because the same effect could have multiple causes, each explained by its corresponding theory. Empathy had a negative result for its key prediction, which could be investigated further by experimental methods.
Although the effects measured were too few to probe any one theory with sufficient thoroughness, they do identify predictions of the theories that need to be probed further. The implications of the experiment are examined below.
Our stimulus conditions were mainly designed to evaluate lower-level visual and cognitive processing, not the higher-level processing of robots, computer-animated characters, and other complex dynamic objects. A more holistic consideration of how human–robot interaction contributes to the UV effect should include dimensions of social communication. These include timing, contingency, interactivity, and motion quality, and their relation to nonvisual modalities, such as speech and touch, not to mention verbal communication, interpersonal relationships, culture, age, and personality (
Brink et al., 2019;
MacDorman, 2019;
MacDorman et al., 2009a,
2009b;
Shin, Kim, & Biocca, 2019;
Tu, Chien, & Yeh, 2020).
The novel objects condition used only greebles; this category may not be representative of novel objects in general. To ensure representativeness, this condition may require more varied exemplars. The relatively cold feelings felt for greebles may be attributable to their being computer renderings. Desaturating all images of color to make familiar objects more comparable to the monochromatic greebles may have reduced their ecological validity.
The diseased body parts condition lacked an adequate control condition, such as the same body part without disease. A better approach, given that human faces were used as controls for other conditions, would be to use similarly photographed diseased faces.
Turning to methodology, there is a degree of arbitrariness in evaluating classes of theories by the relative number of significant effects predicted. That number depends on the particular list of hypotheses and set of stimulus conditions selected.
The eeriness and coldness indices were reliable for all 10 stimulus conditions. Although they gave identical results for the tested hypotheses, their factor analysis, reliability coefficients, means by condition, and correlations indicated they measured different constructs. If combined, their reliability would fall to 0.19. However, items from the humanness index, which had loaded on one factor in robot and computer animation studies (
Ho & MacDorman, 2010,
2017), separated into two factors, realism and humanness, which were not reliable in all conditions. Because neither was a dependent variable, this limitation does not affect the hypotheses.
The mechanisms underlying aversion to Thatcher houses may differ from those underlying aversion to androids or computers with feelings. Depending on the situation, these phenomena could have different perceptual, cognitive, and affective mechanisms. Moreover, the mechanisms underlying, for example, configural processing and threat avoidance could operate in parallel. If so, more than one theory may be required to explain the UV effect (
Gahrn-Andersen, 2020;
Mangan, 2015;
Wang, Lilienfeld, & Rochat, 2015). Different theories about the same mechanism may complement each other by focusing on different levels of description: neural, perceptual, cognitive, behavioral, evolutionary, and so on.
This experiment tested the predictions of nine widely varying classes of UV theories. Configural processing and atypicality+ theories had the greatest number of predictions with significant effects.
For all theories, except novelty avoidance, the experiment used the same stimulus conditions. This approach is new. Past experiments have simultaneously tested the predictions of one or, at most, two theories.
Although the conditions were selected based on the predictions of each type of theory, the experiment only partially tested their assumptions. Future research should investigate the theories in more detail to explain the UV’s causes and mechanisms, which in turn should help designers avoid it.
Supported by a PROMOS scholarship from the German Academic Exchange Service and a doctoral scholarship from the German Academic Scholarship Foundation. Stimulus images of greebles were courtesy of Michael J. Tarr, Center for the Neural Basis of Cognition and Department of Psychology, Carnegie Mellon University,
https://wiki.cnbc.cmu.edu/Novel_Objects.
Alexander Diel designed the experiments, recruited participants, collected and analyzed the data, and drafted the manuscript. Karl F. MacDorman proposed the topic, prepared the scales, analyzed the data, prepared the figures and tables, and revised the manuscript.
Commercial relationships: none.
Corresponding author: Karl F. MacDorman.
Email: kmacdorm@indiana.edu.
Address: Indiana University School of Informatics and Computing, 535 West Michigan St., Indianapolis, IN 46202, USA.