One of the goals in material perception research is to untangle the processes that happen on our visual system to comprehend their roles and know what information they carry. There is an ongoing discussion on whether our visual system is solving an inverse optics problem (
Kawato et al., 1993;
Pizlo, 2001) or if it matches the statistics of the input to our visual system (
Adelson, 2000;
Motoyoshi et al., 2007;
Thompson et al., 2016) to understand the world that surrounds us. Later studies regarding our visual system and how we perceive materials dismiss the inverse optics approach and claim that it is unlikely that our brain estimates the parameters of the reflectance of a surface, when, for instance, we want to measure glossiness (
Fleming, 2014;
Geisler, 2008). Instead, they suggest that our visual system joins low and midlevel statistics to make judgments about surface properties (
Adelson, 2008). On this hypothesis,
Motoyoshi et al. (2007) suggest that the human visual system could be using some sort of measure of histogram symmetry to distinguish glossy surfaces. Other works have explored image statistics in the frequency domain (
Hawken & Parker, 1987;
Schiller et al., 1976), for instance, to characterize material properties (
Giesel & Zaidi, 2013), or to discriminate textures (
Julesz, 1962;
Schaffalitzky & Zisserman, 2001). However, it is argued that, if our visual system actually derives any aspects of material perception from simple statistics (
Anderson & Kim, 2009;
Kim & Anderson, 2010;
Olkkonen & Brainard, 2010). Instead, recent work by
Fleming and Storrs (2019) proposes that, to infer the properties of the scene, our visual system is doing an efficient and accurate encoding of the proximal stimulus (image input to our visual system). Thus, highly nonlinear models, such as deep neural networks, may better explain human perception. In line with such observations,
Bell et al. (2015) show how deep neural networks can be trained in a supervised fashion to accurately recognize materials, and
Wang et al. (2016) later extend it to also recognize materials in light fields. Closer to our work,
Lagunas et al. (2019) devise a deep learning-based material similarity metric that correlates with human perception. They collected judgements on perceived material similarity as a whole, not explicitly taking into account the influence of geometry or illumination, and build their metric upon such judgements. In contrast, we focus on analyzing to which extent geometry and illumination do interfere with our perception of material appearance. We launch several behavioral experiments with carefully controlled stimuli, and ask participants to specify which materials are closer to a reference. In addition, taking inspiration from these recent works, we explore how highly nonlinear models, such as deep neural networks, perform in material classification tasks. We find that such models are capable of accurately recognizing materials, and further observe that deep neural networks may share similar high-level factors to humans when recognizing materials.