Abstract
Statistical learning has been demonstrated to be a valid method of learning the structure of invented languages — simple exposure to consecutive nonsensical syllables allows learning of the transitional probabilities reflecting the rules of languages. Does the acquisition of scene grammar, i.e., the rules governing the structure of naturalistic scene environments, occur in a similar way, or are more interactive approaches more effective, which would more closely reflect the actual way we learn our environments? To answer this question, we attempted to create a number of “invented” scenes with 12 “nonsensical” objects each. These objects were divided into “anchors” (usually larger objects that predict the location of other objects), and “locals,” which were each assigned to an anchor, i.e., appearing close to it 80% of the time. Participants were exposed to these objects in a virtual reality environment in two blocks (active search vs. passive viewing, using two different scenes), each followed by a rating phase where they judged how much any two objects “belong” to each other. In both blocks, a scene was presented 20 times with a randomized object arrangement, adhering to its pre-defined grammar (anchor-local co-occurrence frequency). In the passive block, participants were instructed to simply look at each object successively and press a button, while in the active block they were shown a miniature object cue before searching for it as fast as possible. The “belonging” ratings for each object pair showed a similar pattern in both blocks — e.g., pairs of anchors were rated as less associated to each other than pairs of locals to their belonging anchors. This differentiation was, however, substantially stronger in the active search block, where belonging pairs significantly differed from other pairs of locals, confirming our hypothesis that goal-directed search appears more efficient for scene grammar acquisition than passive viewing.