**People seem to compute the ensemble statistics of objects and use this information to support the recall of individual objects in visual working memory. However, there are many different ways that hierarchical structure might be encoded. We examined the format of structured memories by asking subjects to recall the locations of objects arranged in different spatial clustering structures. Consistent with previous investigations of structured visual memory, subjects recalled objects biased toward the center of their clusters. Subjects also recalled locations more accurately when they were arranged in fewer clusters containing more objects, suggesting that subjects used the clustering structure of objects to aid recall. Furthermore, subjects had more difficulty recalling larger relative distances, consistent with subjects encoding the positions of objects relative to clusters and recalling them with magnitude-proportional (Weber) noise. Our results suggest that clustering improved the fidelity of recall by biasing the recall of locations toward cluster centers to compensate for uncertainty and by reducing the magnitude of encoded relative distances.**

*bias*in in your estimate of the location, it decreases

*variance*and thus improves overall memory fidelity.

*relative*positions of objects: Rather than remember the absolute position of a paper, you may remember its position relative to your desk (e.g., the paper is one foot northwest of your desk; Hollingworth, 2007; Huttenlocher, Hedges, & Duncan, 1991). This relative encoding may be adapted to accommodate hierarchical structures via an assumption that people encode the relative discrepancy between features of individual objects and the average features of the ensemble. This relative encoding view is consistent with vector-summation models of multiobject motion parsing (Gershman, Tenenbaum & Jäkel, in press; Johansson, 1973) and spatial positions (Mutluturk & Boduroglu, 2014). Intuitively, instead of remembering the locations of your papers relative to your desk, you may remember the locations of individual papers relative to the centroid of all the papers.

^{1}such that larger relative distances yield greater errors.

*SD*= 45) with the restriction that objects could not overlap. There were 10 unique environments for each clustering structure for a total of 70 environments.

**Figure 1**

**Figure 1**

*q*) in reporting the locations of two objects as where

*and*

**x**_{i}*are vectors containing the spatial translational error of the two objects' reported locations. The numerator is the projection of the translational error vectors with positive values indicating vectors in the same direction and negative values indicating vectors in the opposite direction. The denominator normalizes the numerator, such that*

**x**_{j}*q*falls between −1 and 1. Thus, if the recalled locations of two objects were both shifted in exactly the same direction,

*q*would be 1; if they were shifted in orthogonal directions,

*q*would be 0; and if they shifted in opposite directions,

*q*would be −1.

*q*) of objects in the same cluster for each environment (Figure 2). We excluded environments without clustering (4C1 and 8C1) from this analysis. For all clustering structures, subjects recalled objects in the same cluster with more similar errors than expected by independent encoding, smallest

*t*value,

*t*(34) = 16.05,

*p*< 0.001). Subjects did not appear to encode the objects independently and instead used the clustering structure of objects.

**Figure 2**

**Figure 2**

^{2}) of subjects' responses (Figure 3). We used a mixed-effects model that included the number of objects, the number of clusters, and their interaction as fixed effects and subjects as random effects to test whether object load and clustering structure affected recall. RMSE was lower in the four-object conditions compared to the eight-object conditions,

*t*(241) = 12.47,

*p*< 0.001 for the linear effect of number of objects, and decreased as the number of objects in each cluster increased for both the four-object and eight-object conditions,

*t*(241) = 16.95,

*p*< 0.001 for the linear effect of number of clusters. Post hoc Tukey's honest significant difference (HSD) pairwise comparisons confirmed that performance improved with every increment of cluster size in both the four-object conditions (smallest difference: 13.30, 95% confidence interval = 3.69–22.92,

*p*= 0.0042) and the eight-object conditions (smallest difference: 14.71, 95% confidence interval = 3.55–25.88,

*p*= 0.0046). The decrease in RMSE with increasing cluster size seems constant across the four- and eight-object conditions,

*t*(241) = .31,

*p*= 0.76 for the interaction of the number of objects and the number of clusters, i.e., the difference in slope of RMSE as a function of number of clusters. The effect of clustering structure on performance suggests that subjects did not encode the objects independently and that subjects used clustering to help remember objects more accurately.

**Figure 3**

**Figure 3**

*p*, and the probability of making a misassociation between an object and another object's location,

_{T}*p*= 1 –

_{M}*p*. The probability of misassociating to a particular location then was , where

_{T}*n*is the number of locations. To determine exactly to which location each object was misassociated, we assumed a bijective mapping of objects to locations (

*f*), such that only one object could be paired with each location.

*f*

^{−1}(

*i*) denotes the inverse mapping from locations to objects.

*β*

_{c}) and the degree to which objects are drawn toward their cluster centers (object-to-cluster bias,

*β*

_{o}). Here, a bias of zero indicates the object/cluster is unbiased, and a bias of one indicates the element is drawn completely toward its parent.

*t*, into their relative positions and then weighted the relative positions by the bias parameters. The decomposition of the true locations yielded a relative position tree in which the locations of objects were represented relative to their clusters (

*x*), the locations of clusters were relative to the global centroid (

*c*), and the global centroid (

*g*) was the mean of the true locations (

*t*). Conditional on the mapping

*f*

^{−1}(

*i*) of the true locations

*t*to response locations

*s*, the position of an object

*i*'s cluster relative to the global center was defined by where

*M()*maps objects to the clusters of which they are members, and

*C*is the absolute position of the cluster center, calculated by averaging the locations of all objects in that cluster. Similarly, the positions of objects relative to their clusters were defined by

*σ*), for locations within the same cluster (

_{g}*σ*), and individual object locations (

_{c}*σ*). This decomposition of object positions induces an expected correlation structure on the errors in reporting individual objects, which can be parameterized with a covariance matrix, Σ, of the form

_{o}*s*denotes the response locations,

*n*is the number of objects,

*n*is the number of objects correctly mapped to their locations by

_{T}*f*, and

*n*is the number of objects incorrectly mapped to their locations by

_{M}*f*. We estimated these parameters (

*f*,

*p*

_{M}, β_{c},

*β*

_{o},

*σ*

_{g},

*σ*

_{c},

*σ*

_{o}) for each environment across subjects using a Markov chain Monte Carlo algorithm (see Appendix C for more details concerning our Markov chain Monte Carlo algorithm and Appendix D for all parameter fits).

*β*:

_{o}*M*= .19,

*SEM*= .02, max = .62), suggesting that subjects remembered the locations of individual objects within their clustering structure rather than storing chunks and discarding their internal components. Additionally, contrary to the pattern of bias we expected to find if subjects encoded objects in a hierarchical generative model, as objects were arranged in fewer clusters containing more objects, the objects tended to be recalled with less bias toward their clusters (Figure 4),

*t*(47) = 7.14,

*p*< 0.001 for the linear effect of number of clusters on

*β*in a model including fixed effects of number of objects and number of clusters). Post hoc Tukey's HSD pairwise comparison tests confirmed that objects' bias toward their clusters varied with the number of clusters for the four-object conditions (smallest difference: .099, 95% confidence interval = .060–.14,

_{o}*p*< 0.001). With the exception of the 2C4 and 1C8 conditions (difference: .11, 95% confidence interval = −.017–.24,

*p*= 0.098), the bias of objects toward their clusters also varied for the eight-object conditions (smallest difference: .16, 95% confidence interval = .031–.29,

*p*= 0.01). However, even though the bias of objects toward their clusters was generally low, objects were consistently recalled with

*some*bias. Together, this pattern of bias suggests that subjects encoded objects in a hierarchical generative model but did not rely primarily on this form of representation.

**Figure 4**

**Figure 4**

^{3}require larger relative distances to represent positions. Consequently, as the dispersion of clusters in the environment increases, subjects should recall clusters less precisely (that is,

*σ*should increase). The dispersion of clusters in an environment was significantly correlated with the precision with which subjects recalled cluster centers (

_{c}*r*= 0.38

*p*< 0.01) (Figure 5), consistent with subjects encoding objects according to their relative positions and having difficulty recalling larger relative distances.

**Figure 5**

**Figure 5**

*SD*= .033). The chunking, hierarchical generative and relative position models all use the maximum likelihood clustering structures of the environments estimated by the nonparametric Dirichlet process.

*σ*) estimated by our error model separately for the four-object and eight-object conditions.

*r*= .55, 95% confidence interval = .37–.70). The relative position model fit environments across clustering structures slightly better than the hierarchical generative model (hierarchical generative:

*r*= 0.70, 95% confidence interval = .56–.80; relative position:

*r*= 0.89, 95% confidence interval = .82–.93). Within clustering structures, the hierarchical generative model and relative position models generally predicted the difficulty of environments accurately. Notably, however, the hierarchical generative model matched subjects' behavior particularly poorly for 1C4 and 1C8 environments. This is most likely because when all the objects are in a single cluster, the hierarchical generative model tends to recall objects excessively biased toward the cluster centers. Instead, as our analysis of the bias of objects toward their clusters demonstrated, subjects retained a lot more information about the individual objects in these one-cluster environments. This pattern and the relative position model's better ability to predict behavior suggest that relative position encoding dominated subjects' errors.

**Table 1**

*infer*object properties, people also use the hierarchy to

*encode*object properties as relative offsets from the central tendency of their group. Because relative positions seem to be recalled with Weber noise, hierarchical clustering reduces the number of large distances that subjects encoded and thus increases overall accuracy.

*each other*with greater bias exerted by nearby objects (e.g., such as gravity with force dropping off with distance). Unfortunately, our results cannot distinguish whether objects were biased toward each other or toward inferred cluster centers.

^{4}and because they seem insufficient to attain the precision exhibited by visual spatial memory. Because verbally encoded spatial relations (such as “above” or “left”) offer only imprecise location information, we suspect that the main benefit of such verbal encoding was to reduce misassociations between objects (Lew, Pashler, & Vul, in press) rather than encoding the locations themselves. Additionally, patterns of oculomotor movements and attentional shifts could have influenced performance by interfering with encoding in visual memory (Lawrence, Myerson, & Abrams, 2004). Although the uniform distribution of cluster centers in our study still mandates many changes of fixation, it is possible that clustering yields fewer eye movements and attentional shifts between objects in the same cluster, improving the fidelity of memories. Our presentation times were also longer than most visual working memory studies, which may have given subjects more time to encode objects. Given that performance appears to asymptote with display times shorter than those used in the current study (Bays, Gorgoraptis, Wee, Marshall, & Husain, 2011), our results may reflect how people encode stimuli when given enough time to thoroughly observe all objects. Varying the encoding time, delay time, or the environment statistics might reveal how people navigate the space of possible encoding schemes.

*Proceedings of the National Academy of Sciences, USA*, 106 (18), 7345–7350.

*The Journal of Neuroscience*, 31 (3), 1128–1138.

*Psychological Science*, 12 (2), 157–162.

*Science*, 321, 851–854.

*Psychological Science*, 22 (3), 384–392.

*Journal of Experimental Psychology: General*, 138 (4), 487–502.

*Proceedings of the National Academy of Sciences, USA*, 105 (38), 14325–14329.

*Psychological Science*, 24 (6), 981–990.

*Psychological Review*, 120 (1), 85–109.

*Behavioral and Brain Sciences*, 24 (1), 87–114.

*Journal of Experimental Psychology: General*, 143 (2), 548–565.

*Recent Advances in Statistics*, 24 , 287–302.

*Pattern Analysis and Machine Intelligence, IEEE Transactions on*, 6, 721–741.

*Vision Research*.

*Journal of Experimental Psychology: Human Perception and Performance*, 40 (5), 1779–1788.

*Journal of Experimental Psychology: Human Perception and Performance*, 33 (1), 31–47.

*Psychological Review*, 98 (3), 352.

*Perception*, 43 (7), 663–676.

*Perception & Psychophysics*, 14 (2), 201–211.

*Psychonomic Bulletin & Review*, 11 (3), 488–494.

*Nature Neuroscience*, 17 (3), 347–356.

*Psychological Review*, 63 (2), 81–97.

*Attention, Perception, & Psychophysics*, 76 (8), 2276–2285.

*Psychological Review*, 120 (2), 297–328.

*Current Directions in Psychological Science*, 23 (3), 164–170.

*Psychological Review*, 119 (4), 807–830.

*Experimental Brain Research*, 207 (3–4), 221–231.

*Cognitive Psychology*, 24 (3), 295–340.

*Nature*, 453, 233–235.

^{3}In our study, we held the standard deviation of objects within clusters constant, preventing us from analyzing the effect of relative distance on the accuracy of objects. We predict that this relationship between relative distance and accuracy should remain true for objects within the same cluster.

^{4}Although Brady et al. (2013) assessed the influence of verbal strategies in long-term visual memory, they also found that both short- and long-term visual memory rely on similar representations; thus, it seems reasonable to apply their findings to short-term memories in our experiments. Moreover, the greater precision in short-term memory would seem to make verbal encoding even less effective here than in long-term memory.

*q*) to measure whether subjects recalled clustered objects with more similar errors. The error similarity of objects in the same cluster was consistently greater than 0,

*t*(58) = 23.83,

*p*< 0.001 (Figure A1), indicating that memory errors did not accumulate homogeneously for all objects. Instead, subjects' responses respected the clustering structure of the objects.

**Figure A1**

**Figure A1**

*t*(438) = 1.56,

*p*= 0.12 for the linear effect of the X-position bin in a model including the fixed effect of the X-position bin). However, the Y-dimension of an object's position did affect the magnitude of errors,

*t*(438) = 2.90,

*p*= 0.003 for the linear effect of the Y-position bin in a model including the fixed effect of the Y-position bin), such that errors in the Y-dimension increased toward the bottom of the environment. Given that the environments were symmetrical, this most likely reflects subjects initially dragging objects from below the environment to place them rather than subjects using salient positions or landmarks.

**Figure A2**

**Figure A2**

*Θ*be the set of parameters {

^{(i)}*p*,

_{M}^{(i)}*β*

*,*

_{c}^{(i)}*β*

*,*

_{o}^{(i)}*σ*

*,*

_{g}^{(i)}*σ*

*,*

_{c}^{(i)}*σ*

*} at iteration*

_{o}^{(i)}*i*and

*f*be the mapping of true locations to response locations at iteration

^{(i)}*i*. In each iteration, the algorithm samples the values of the parameters that compose Θ conditional on the current mappings of

*f*and then samples the mappings of

*f*conditional on the previously sampled value of Θ. The exact algorithm is

- 1. Choose random starting values for the parameters
*f*^{(0)}and Θ^{(0)}. - 2. At iteration
*i*, draw a candidate Θ^{*}from its proposal distribution*P*(Θ^{*}|Θ^{(}^{i}^{–1)}) - 4. Accept Θ
^{*}as Θ^{(}^{i}^{)}with probability*min*(*a*,1). If Θ^{*}is not accepted, then Θ^{(}^{i}^{)}= Θ^{(}^{i}^{–1)}. - 5. Draw a candidate
*f*^{*}from its proposal distribution*Q*(*f*^{*}|*f*^{(}^{i}^{–1)},Θ^{(}^{i}^{)}). - 7. Accept
*f*^{*}as*f*^{(}^{i}^{)}with probability*min*(*a*,1). If*f*^{*}is not accepted, then*f*^{(}^{i}^{)}=*f*^{(}^{i}^{− 1)}. - 8. Repeat steps 2–7
*N*times to get*N*samples of*f*and Θ.

*P*(Θ

^{*}|Θ

^{(}

^{i}^{–1)}), we used truncated normal distributions for each parameter's proposal distribution (the truncation enforced the constraints that the noise parameters must be greater than zero and the bias and misassociation probabilities must be between zero and one). Noise proposal distributions had a standard deviation of 2.5 and bias and probability proposal distributions had a standard deviation of .1.

*Q*(

*f*

^{*}|

*f*

^{(}

^{i}^{–1)},Θ

^{(}

^{i}^{)})

*,*we sampled two unique objects based on the inverse likelihood that they came from their currently assigned locations. Intuitively, this selects the two objects that are currently least likely to be assigned to the correct locations. We then swapped the assignments of the sampled objects to create a new mapping proposal assignment.

*N*to 3200 and treated the first 800 samples as burn-in.

**Table A1**

**Table A2**

*p*)

_{M}^{5}increased with the number of objects, it was unaffected by the clustering structure of objects. This suggests that subjects did not use the clustering structure of objects to minimize binding errors.

*β*). The decreasing bias of clusters toward the global center may suggest that subjects relied on a representation of objects' hierarchical generative model when remembering the locations of clusters, relying less on the location of the global center as the number of clusters decreased. However, it is unclear why this pattern did not extend to objects' bias toward their clusters.

_{c}*σ*) and objects (

_{c}*σ*) more accurately. The decreasing noise of cluster and object memories is consistent with the relative position model—organizing objects into fewer clusters should decrease the magnitude of the relative positions needed to represent the objects' and clusters' locations. The clustering structure of objects had an unclear effect on the noise of the global center (

_{o}*σ*), i.e., the error that is shared among all objects in a display. Subjects appeared to remember the global center more accurately as the number of clusters decreased, but this benefit went away when objects were arranged in a single cluster. The sudden increase in the noise of the global center may reflect subjects focusing on encoding the locations of the individual objects at the cost of the global center when they do not need to remember the clustering structure of objects. Consequently, it is difficult to determine exactly how the objects' clustering structure influenced memories of the global center.

_{g}*q*) of objects compared to the actual clustering structures used to generate the locations of the objects. For each condition, we found the average error similarity of objects in the same cluster (Figure A3). If no objects were in the same cluster, we calculated the average error similarity over all objects.

**Figure A3**

**Figure A3**

*t*(34) = 8.20,

*p*< 0.001, and 8C1,

*t*(34) = 14.20,

*p*< 0.001. This demonstrates that the Dirichlet process grouped objects like subjects did even when there was no intended clustering structure. In the other conditions, the error similarity of objects that were actually from the same cluster versus those that the Dirichlet process inferred were from the same cluster were similar, suggesting that both subjects and the Dirichlet process recovered the intended clustering structures.