One perspective suggests that invariant object recognition is achieved by generating an internal representation consisting of a three-dimensional (3-D) model of each object describing its parts and configuration (Biederman,
1987). Exposure to one or a few object views that provide enough information about the parts and configuration of an object should be sufficient to build such a 3-D model. An implication is that training with a single view will generalize to views that are far apart from it (Wang, Obama, Yamashita, Sugihara, & Tanaka,
2005), as long as nonaccidental properties (features that maintain constant geometry across views, such as linearity, parallelism, curvilinearity, and symmetry) are visible (Amir, Biederman, & Hayworth,
2012). This learning may be particularly effective for recognition of objects from different categories that differ in their parts and configuration (basic-level recognition; e.g., car vs. boat), but may not be sufficient for discriminating among objects within a category that share similar parts and configuration (subordinate-level recognition; e.g., Toyota vs. Honda; Amir, Biederman, Herald, Shah, & Mintz,
2014). A second perspective suggests that the representation of objects is view-based (Bülthoff & Edelman,
1992; Grill-Spector et al.,
1999; Logothetis & Pauls,
1995), enabling both basic and subordinate recognition (Bülthoff et al.,
1995). According to view-based theories, during learning participants need to be exposed to multiple views of a novel object to learn its “view space” (that is, a continous space of all possible views of an object; Bülthoff & Edelman,
1992). However, these theories do not specify what is the expected view-tuning width around each learned view (Bülthoff & Edelman,
1992; Hayward & Tarr,
1997; Wallis, Backus, Langer, Huebner, & Bülthoff,
2009) and if the view tuning depends on learning parameters in addition to the structure of the object. Nevertheless, these theories suggest that spatial and/or temporal continuity among object views during unsupervised training may be key for linking object views into a coherent internal representation (DiCarlo & Cox,
2007; Liu,
2007; Sinha & Poggio,
1996; Wallis et al.,
2009; Wallis & Bülthoff,
2001).