However, why can we recognize objects so accurately and fast and what are the mechanisms underlying the rapid processing of objects? Some studies have found evidence suggesting that visual processing of objects consists of two stages: an early perceptual stage (∼75 ms) that distinguishes between observed features and a later pathway for decision making (∼150 ms; Johnson & Olshausen,
2003,
2005; VanRullen & Thorpe,
2001b). While the diagnostic act of recognizing and categorizing objects probably happens at a later stage (Smith, Gosselin, & Schyns,
2007; van Rijsbergen & Schyns,
2009), these studies stress that the analysis of features performed by our visual system is highly important for successful detection and identification. In a similar vein, several other studies suggested that the automatic feed-forward analysis of specific image features is mainly responsible for accurate and fast behavioral responses (Kirchner & Thorpe,
2006; Thorpe et al.,
1996; VanRullen & Thorpe,
2001a,
2001b). Results that support this line of thought come from a variety of studies in which a specific set of image statistics, such as color, texture, and shape were varied to study their effect on detection (e.g., Vogels,
1999). Image statistics that have been identified to help object or scene recognition are contrast (Brodie, Wallace, & Sharrat,
1991; Macé, Delorme, Richard, & Fabre-Thorpe,
2010), color (Brodie et al.,
1991; Delorme, Richard, & Fabre-Thorpe,
2010; Elder & Velisavljeviæ,
2009; Gegenfurtner & Rieger,
2000; Goffaux et al.,
2005; Humphrey, Goodale, Jakobson, & Servos,
1994; Oliva & Schyns,
2000; Wichmann, Sharpe, & Gegenfurtner,
2002; Wurm, Legge, Isenberg, & Luebker,
1993; Yao & Einhäuser,
2008), background coherence with the object (Biederman,
1972; Biederman, Glass, & Stacy,
1973), shape or posture (Delorme et al.,
2010; Elder & Velisavljeviæ,
2009), object size (Delorme et al.,
2010), texture (Delorme et al.,
2010; Elder & Velisavljeviæ,
2009; Renninger & Malik,
2004), Fourier spectra (Gaspar & Rousselet,
2009; Joubert, Rousselet, Fabre-Thorpe, & Fize,
2009; McCotter, Gosselin, Sowden, & Schyns,
2005), and, though weakly, luminance (Delorme et al.,
2010; Elder & Velisavljeviæ,
2009; Macé et al.,
2010; Nandakumar & Malik,
2009). Many of these studies used data sets of images that contain objects belonging to an explicit (target) category. Rarely, however, were subcategories or image features controlled for. Depending on the specific choice, categories might be subject to a selection bias and may have distinct visual features that make them especially easy to detect and recognize (Ohman,
1993; Ohman, Flykt, & Esteves,
2001; Tipples, Young, Quinlan, Broks, & Ellis,
2002). Frequently, animals are chosen as target category, and, indeed, they have very specific characteristics, such as eyes and elongated legs, that are considerably important for their successful identification (Delorme et al.,
2010). In addition, images of animals are likely to be pre-segmented by the photographer (Wichmann, Drewes, Rosas, & Gegenfurtner,
2010) and animals are, therefore, often subject to “unnatural” positions and figure–ground separations. Hence, a variety of features and their constellations are important for efficient and rapid processing of objects. It has remained elusive, however, to what extent all these features relatively contribute to recognition performance. Here, we integrate a large variety of features and their relations in a model to predict several measures of performance.