**A representation of shape that is low dimensional and stable across minor disruptions is critical for object recognition. Computer vision research suggests that such a representation can be supported by the medial axis—a computational model for extracting a shape's internal skeleton. However, few studies have shown evidence of medial axis processing in humans, and even fewer have examined how the medial axis is extracted in the presence of disruptive contours. Here, we tested whether human skeletal representations of shape reflect the medial axis transform (MAT), a computation sensitive to all available contours, or a pruned medial axis, which ignores contours that may be considered “noise.” Across three experiments, participants ( N = 2062) were shown complete, perturbed, or illusory two-dimensional shapes on a tablet computer and were asked to tap the shapes anywhere once. When directly compared with another viable model of shape perception (based on principal axes), participants' collective responses were better fit by the medial axis, and a direct test of boundary avoidance suggested that this result was not likely because of a task-specific cognitive strategy (Experiment 1). Moreover, participants' responses reflected a pruned computation in shapes with small or large internal or external perturbations (Experiment 2) and under conditions of illusory contours (Experiment 3). These findings extend previous work by suggesting that humans extract a relatively stable medial axis of shapes. A relatively stable skeletal representation, reflected by a pruned model, may be well equipped to support real-world shape perception and object recognition.**

^{1}By contrast, the latent variable models allowed us to estimate the fit of participants' responses to each structure while controlling for length (see the Appendix for the notation of the latent variable models). These analyses assume that if participants' responses correspond to the model of interest, then these responses should reflect a point along that model's structure and some tapping error (constrained by the border of the shape). Maximum likelihood estimation was used to determine the variance terms that best described the distribution of participants' responses around each point of a model's respective structure (see the Appendix for the notation of the likelihood functions).

*inside*the shape and, therefore,

*away*from the boundaries. To address this possibility, we tested boundary avoidance models that were based on the following two assumptions. First, if participants' responses are systematically biased away from the boundary, then they would be distributed across a smaller portion of the shape's area whose edges are equidistant from the boundary. Second, if participants' responses do not correspond to any specific model of shape perception, their responses would be randomly distributed within that smaller region of the shape. Such a response pattern would be best fit by a uniform distribution—defined here as a grid of equally-spaced points inside the shape. Because it was unclear to what extent participants should avoid the boundary, we tested the boundary avoidance model with multiple possible degrees of avoidance.

^{1}In the current study, we compared models on the basis of their fit to participants' responses, such that the explanatory power of each axis point was considered in combination with response distance (see the General Analyses section and Appendix). When the length of each model is controlled for using this method, we find that participants' responses are best fit by a leniently pruned medial axis model with branches to the outer corners but not to the perturbations.

*pruning*may prove to be a misnomer. This possibility is consistent with evidence of medial axis processing in higher-level visual areas that are less edge sensitive (e.g., Hung et al., 2012). According to this view, the medial axis is computed from a shape that has already been filtered for noisy contours or undergone perceptual completion (Kourtzi & Kanwisher, 2001) and thus has no need to extract additional branches. Some researchers have also suggested that a pruned medial axis may arise in early visual areas via top-down attentional mechanisms that extract only the relevant properties of the shape (Ardila et al., 2012). According to this account, feedback mechanisms may be recruited in certain contexts (e.g., subordinate-level categorization) to provide higher-resolution information about an object's contours (Lee, Mumford, Romero, & Lamme, 1998), such that, instead of being pruned, the medial axis “grows” new branches during visual processing to accommodate greater levels of detail. It is also possible that the aforementioned accounts are not mutually exclusive and that an object's shape may be represented simultaneously at multiple scales of detail by medial axes with various degrees of pruning (Green, 2017; Hummel, 2013).

*PLoS One*, 8 (11), https://doi.org/10.1371/journal.pone.0078985.

*Medial axis generation in a model of perceptual organization*. Paper presented at the 2012 46th Annual Conference on Information Sciences and Systems (CISS), Princeton, NJ.

*Mathematical foundations of scientific visualization, computer graphics, and massive data exploration*(pp. 109–125). Berlin: Springer.

*bioRxiv*, 1–22, https://doi.org/10.1101/518795.

*Psychological Review*, 94, 115–147.

*Models for the perception of speech and visual form*(pp. 362–380). Cambridge, MA: MIT Press.

*Journal of Theoretical Biology*, 38, 205–287.

*Pattern Recognition*, 10, 167–180, https://doi.org/10.1016/0031-3203(78)90025-0.

*Vision Research*, 39, 257–269.

*Perceptual organization*(pp. 99–118). Hillsdale, NJ: Erlbaum.

*Statistical inference*(2nd ed.). Pacific Grove, CA: Duxberry.

*Journal of Experimental Psychology: Human Perception and Performance*, 45, 111–124, https://doi.org/10.1037/xhp0000592.

*Monthly Notices of the Royal Astronomical Society*, 454, 1140–1156, https://doi.org/10.1093/mnras/stv1996.

*Journal of Experimental Psychology: Animal Behavior Processes*, 31, 254–259, https://doi.org/10.1037/0097-7403.31.2.254.

*Bootstrap methods and their application*. Cambridge, UK: Cambridge University Press.

*Cognition*, 99, 275–325, https://doi.org/10.1016/j.cognition.2005.03.004.

*Role of scale in partitioning shape*. Paper presented at the Proceedings of the International Conference on Image Processing, Rochester, NY.

*An augmented fast marching method for computing skeletons and centerlines*. Paper presented at the Proceedings of VisSym, Barcelona, Spain.

*An introduction to the bootstrap*. Boca Raton, FL: Chapman & Hall.

*Annual Review of Vision Science*, 4, 423–450, https://doi.org/10.1146/annurev-vision-091517-034110.

*Proceedings of the National Academy of Sciences, USA*, 103, 18014–18019.

*Shape perception in human and computer vision: An interdisciplinary perspective*(pp. 55–70). London: Springer.

*Psychological Science*, 25, 377–386.

*International Journal of Computer Vision*, 54, 143–157.

*British Journal for the Philosophy of Science*, 68, 355–387, https://doi.org/10.1093/bjps/axv042.

*Journal of Vision*, 9 (6): 13, 1–21. https://doi.org/10.1167/9.6.13. [PubMed] [Article]

*Cognition*, 18, 65–96, https://doi.org/10.1016/0010-0277(84)90022-2.

*Journal of Physiology*, 148, 574–591, https://doi.org/10.1113/jphysiol.1959.sp006308.

*Oxford handbook of cognitive psychology*(pp. 32–46). New York: Oxford University Press.

*Computational processes in human vision: An interdisciplinary perspective*(pp. 430–444). Norwood, NJ: Ablex.

*Quarterly Journal of Experimental Psychology*, 46, 137–159.

*Neuron*, 74, 1099–1113.

*The emerging spatial mind*(pp. 3–24). New York: Oxford University Press.

*Perceptual organization as object recognition divided by two*. Paper presented at the Workshop on Perceptual Organization in Computer Vision, Vancouver, Canada.

*Scientific American*, 234, 48–52.

*Communicative & Integrative Biology*, 4, 710–712.

*Journal of Physiology-Paris*, 97, 155–190.

*International Journal of Computer Vision*, 15, 189–224, https://doi.org/10.1007/bf01451741.

*Science*, 293, 1506–1509.

*Vision Research*, 38, 2323–2333.

*Nature*, 370, 644–646.

*Neurophysiological evidence for image segmentation and medial axis computation in primate V1*. Paper presented at the Computation and Neural Systems: Proceedings of the Fourth Annual Computational Neuroscience Conference, Location.

*Journal of Physiology-Paris*, 97, 121–139.

*Vision Research*, 38, 2429–2454.

*Cerebral Cortex*, 23, 629–637.

*Cognitive Science*, 13, 357–387, https://doi.org/10.1207/s15516709cog1303_2.

*Symmetry, causality, mind*. Cambridge, MA: MIT press.

*Advances in neural information processing systems*(Vol. 12, pp. 136–142). Cambridge, MA: MIT Press.

*Pattern Recognition Letters*, 34, 1138–1145, https://doi.org/10.1016/j.patrec.2013.03.013.

*Approximate tree matching and shape similarity*. Paper presented at the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece.

*Attention, Perception, & Psychophysics*, 80, 1278–1289, https://doi.org/10.3758/s13414-017-1457-8.

*Proceedings of the Royal Society of London B: Biological Sciences*, 200, 269–294.

*Vision Research*, 39, 2929–2946, https://doi.org/10.1016/S0042-6989(99)00029-2.

*Computers & Graphics*, 36, 477–487.

*Pattern Recognition*, 28, 343–359, https://doi.org/10.1016/0031-3203(94)00105-U.

*IEEE Transactions on Pattern Analysis and Machine Intelligence*, PAMI-9, 505–511, https://doi.org/10.1109/TPAMI.1987.4767938.

*Journal of Experimental Psychology: Human Perception and Performance*, 4, 101–111.

*Annals of Statistics*, 6, 461–464, https://doi.org/10.1214/aos/1176344136.

*IEEE Transactions on Pattern Analysis and Machine Intelligence*, 26, 550–571.

*Computer Vision and Image Understanding*, 69, 156–169, https://doi.org/10.1006/cviu.1997.0598.

*International Journal of Computer Vision*, 35, 13–32.

*Perception & Psychophysics*, 61, 636–660, https://doi.org/10.3758/bf03205536.

*Vision Research*, 126, 330–346, https://doi.org/10.1016/j.visres.2015.08.009.

*Scientific Reports*, 6, 1–11, https://doi.org/10.1038/srep36245.

*Animal Behavior and Cognition*, 4, 267–285.

*Optimal inference for hierarchical skeleton abstraction*. Paper presented at the Proceedings of the 17th International Conference on Pattern Recognition, Cambridge, UK.

*International Journal of Computer Vision*, 94, 215–240.

*Vision Research*, 43, 1637–1653, https://doi.org/10.1016/S0042-6989(03)00168-8.

*Journal of Experimental Psychology: General*, 138, 546–560, https://doi.org/10.1037/a0017352.

*Psychological Science*, 19, 645–647.

*Perception*, 37, 207–244.

*Psychological Bulletin*, 138, 1172–1217.

*Perception*, 15, 355–366.

*Cortex*, 9, 152–164.

*Multimedia Tools and Applications*, 76, 8285–8303, https://doi.org/10.1007/s11042-016-3395-1.

^{1}A difficulty with comparing MAT and pruned models is that the added length of the MAT confers a statistical advantage when models are compared on the basis of response distance alone. To address this issue, Firestone and Scholl (2014) randomly sampled an equal number of points from each model. However, in comparisons of response distance, this approach does not account for the statistical advantage of the longer model. The points will still be distributed in a greater portion of the shape and will therefore be more likely to capture participants' responses by chance. For example, using this approach, a model consisting of evenly spaced points across the shape would outperform any of the axis models tested here because a participant's response would always be within close proximity of any point on the grid, regardless of where it fell. Therefore, as described in the main text, it is important that models be evaluated on the basis of their fit to participants' responses, such that the explanatory power of each axis point is considered in combination with response distance.

*S*and

*T*denote the two-dimensional subspace within a shape (i.e., feasible response region) and the axis points, respectively. This latent variable model assumes that each response is generated as follows:

- STEP 1: Sample a point
= (**θ***θ*_{1},*θ*_{2}) uniformly from*T*. - STEP 2: Given
, sample a two-dimensional error**θ**= (**ε***ε*_{1},*ε*_{2}), where*ε*_{1}and*ε*_{2}are independently sampled from a normal distribution with mean 0 and standard deviation*σ*. - STEP 3: Let
**X =****θ****+**. If**ε***X*is within the feasible region*S*, stop. Otherwise, repeat STEP 2, until**X**falls in*S*. - STEP 4: Output
**X**as the response.

*in STEP 1 can be viewed as an ideal point on the model's structure that a participant ideally would like to touch. This ideal point is not observable, and therefore*

**θ***is regarded as a latent vector. It is*

**θ***plus some response error that is observable. Here, the error follows a truncated bivariate normal distribution that guarantees the observation*

**θ****X**to be in the feasible region

*S*. Note that

*σ*

^{2}is the only unknown parameter in this model that quantifies the variation of the deviation from the axes. Based on this model, the marginal distribution of

**X**has probability density function:

**x**= (

*x*

_{1},

*x*

_{2}), |

*T*| denotes the length of axis

*T*and

**x**

_{1}, ...,

**x**

*from*

_{n}*n*participants, the log-likelihood function of unknown parameter

*σ*

^{2}is written as

*σ*

^{2}is then obtained by

**c**= (

*c*

_{1},

*c*

_{2}) as the center of mass of the shape. The proposed model assumes that each response is generated as follows:

- STEP 1: Sample a binary variable
*D ∈*{0, 1}, with*P*(*D*= 1) =*p*. - STEP 2: If
*D*= 1, let=**θ****c**. Otherwise, sample a point=**θ****(***θ*_{1},*θ*_{2}) uniformly from*T*. - STEP 3: If
*D*= 1, sample a two-dimensional error = (*ε*_{1},*ε*_{2}), where*ε*_{1}and*ε*_{2}are independently sampled from normal distribution with mean 0 and standard deviation*σ*_{1}or from a t-distribution with mean 0, scale parameter*σ*_{1}, and degree-of-freedom*ν*. Otherwise, if*D*= 0, sample= (**ε***ε*_{1},*ε*_{2}), where*ε*_{1}and*ε*_{2}are independently sampled from normal distribution with mean 0 and standard deviation*σ*. - STEP 4: Let
**X**=+**θ**. If**ε****X**is within the feasible region*S*, stop. Otherwise, repeat STEP 3, until**X**falls in*S*. - STEP 5: Output
**X**as the response.

*p*) or randomly chooses a point on a model's structure (with probability 1 –

*p*) as the ideal response point. Then, response error is generated depending on whether the ideal point is the center or sampled from the axis. We considered in the analysis two response error distributions, namely, a truncated normal distribution and a truncated t-distribution with one degree-of-freedom. The log-likelihood function for this model is a function of unknown parameters

*σ*

^{2},

*p*, written as

*ν*degree-of-freedom (

*ν*> 0). Similar to the analysis of Model 1, parameter estimation and model comparison were conducted by making use of the log-likelihood function.