Open Access
Article  |   February 2016
Numerosity as a topological invariant
Author Affiliations
Journal of Vision February 2016, Vol.16, 30. doi:10.1167/16.3.30
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to Subscribers Only
      Sign In or Create an Account ×
    • Get Citation

      Tobias Kluth, Christoph Zetzsche; Numerosity as a topological invariant. Journal of Vision 2016;16(3):30. doi: 10.1167/16.3.30.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

The ability to quickly recognize the number of objects in our environment is a fundamental cognitive function. However, it is far from clear which computations and which actual neural processing mechanisms are used to provide us with such a skill. Here we try to provide a detailed and comprehensive analysis of this issue, which comprises both the basic mathematical foundations and the peculiarities imposed by the structure of the visual system and by the neural computations provided by the visual cortex. We suggest that numerosity should be considered as a mathematical invariant. Making use of concepts from mathematical topology—like connectedness, Betti numbers, and the Gauss–Bonnet theorem—we derive the basic computations suited for the computation of this invariant. We show that the computation of numerosity is possible in a neurophysiologically plausible fashion using only computational elements which are known to exist in the visual cortex. We further show that a fundamental feature of numerosity perception, its Weber property, arises naturally, assuming noise in the basic neural operations. The model is tested on an extended data set (made publicly available). It is hoped that our results can provide a general framework for future research on the invariance properties of the numerosity system.

Introduction
Information about the number of objects in the environment can be extracted in a fast and effortless fashion by the visual systems of humans and other species. Access to this information is a crucial factor in evolution: How many predators am I confronted with? From which tree can I get the larger amount of food? Questions like these show that survival can essentially depend on having access to this type of knowledge. It is therefore not surprising that the estimation of number—or of its approximation, numerosity (Brannon, 2006)—is also considered to be a fundamental property of cognition. 
Humans recognize numbers rapidly and precisely up to four (“subitizing”; Kaufman, Lord, Reese, & Volkmann, 1949), but we also estimate larger numbers in a rapid fashion, although there with increasing errors. A typical characteristic of the numerosity system is that the errors depend on the set size, in accordance with the Weber–Fechner law (Gallistel & Gelman, 1992). Whether these characteristics support the idea of two clearly distinct subsystems or reflect different operation modes of a general number system is still under debate (Feigenson, Dehaene, & Spelke, 2004; Piazza, Mechelli, Butterworth, & Price, 2002; Ross & Burr, 2010). Not only has estimation of numerosity been reported for human adults (Whalen, Gallistel, & Gelman, 1999), but it has also been shown that even 6-month-old infants were able to perform a number-distinction task (Xu & Spelke, 2000). Investigation of the development of mathematical competence and the ability for numerosity estimation in children suggests that mathematical ability is correlated with acuity in numerosity estimation (Halberda, Mazzocco, & Feigenson, 2008). But number estimation is not restricted to humans with mature cognitive abilities—it has also been found in infants and animals (Brannon, 2006; Nieder, Freedman, & Miller, 2002), and recently even in invertebrates (Gross et al., 2009). 
In general, numerosity is one of the most abstract properties in the environment, and its perception is almost independent of spatial attributes like orientation (Allik, Tuulmets, & Vos, 1991) and of the shape of the objects (Strauss & Curtis, 1981). It is also not confined to a specific sensory modality (Starkey, Spelke, & Gelman, 1990). And finally, numerosity estimation extends beyond the estimation of the cardinality of objects. The quantity of physical properties like sound volume, space, and time shows similar characteristics (Bonn & Cantlon, 2012). This suggests that there might exist a generalized system for magnitude estimation (Gallistel & Gelman, 2000; Walsh, 2003). 
The standard view of cortical organization as a local-to-global processing hierarchy (Hegde & Felleman, 2007) which goes from basic sensory properties toward the representation of the most abstract properties on top of the hierarchy would suggest that numerosity has to be considered a very high-level feature. On the other hand, single-cell recordings show that neural reactions to numerosity are quite fast, approximately 100 ms in macaques (Roitman, Brannon, & Platt, 2007). Likewise, human evoked potentials show number-specific responses as early as 75 ms (Park, DeWind, Woldorff, & Brannon, 2015). This indicates that number processing starts at a relatively early level. 
The reaction times in numerosity-estimation tasks are independent of the number of elements, suggesting that numerosity is processed in parallel (Mandler & Shebo, 1982). Physiological results also argue for a parallel extraction of numerosity (Nieder et al., 2002). In addition, there is evidence for a “direct visual sense for number,” since number seems to be a primary visual property like color, orientation, or motion, to which the visual system can be adapted by prolonged viewing (Ross & Burr, 2010). And finally, there is an ongoing debate about whether we have a true sense of number (Anobile, Cicchini, & Burr, 2014; Ross & Burr, 2010) or whether our apparent number sense is in fact just a variant of texture perception, namely the perception of texture density (Dakin, Tibber, Greenwood, Kingdom, & Morgan, 2011; Durgin, 2008; Raphael, Dillenburger, & Morgan, 2013). 
All this shows that the understanding of numerosity should be regarded as a constitutive element for our understanding of perception and cognition. And for this it is of obvious importance to understand how numerosity is computed by the visual system. However, there is as yet no agreement upon a canonical structure for models which address the problem of numerosity perception. Rather, there exists a variety of different model approaches (e.g., Dakin et al., 2011; Dehaene & Changeux, 1993; Meck & Church, 1983; Stoianov & Zorzi, 2012; Verguts & Fias, 2004; Zetzsche & Barth, 1990b), and it is unclear how they are exactly related to each other and whether they all share some common basis. It is also important to note that some of these models cannot be considered as full computational models that can be realized in a neurobiologically realistic fashion. Such models have to account for the complete processing chain, from the retinal image over neurobiologically plausible transformation stages to the final number assignment. 
To our knowledge, the first model that matched these criteria was suggested by Zetzsche and Barth (1990b), and their earlier results constitute an important basis for our present analysis. A very influential model that also matches the criteria (up to the point that it is only a 1-D model) has been suggested by Dehaene and Changeux (1993). This model is based on different processing stages, with the essential components being difference-of-Gaussian filters of different sizes and an accumulation system. These filters restrict the model to represent all stimuli as blob-like features, which limits its invariance properties with respect to the shape of the elements in a stimulus. Whether human observers show related deviations from invariance which can be attributed to a blob-like representation remains to be determined. 
A more recent model by Dakin et al. (2011) uses texture density computed by a ratio of pooled high- and low-spatial-frequency filter outputs. Assuming that the high spatial frequencies are largely determined by the number of objects, an estimate for numerosity is determined by the product of area and density. By definition, the pooled high-frequency output depends on the length of the object contours presented in the stimulus. The model tests so far have used stimuli consisting of similar objects, such that numerosity is approximately proportional to accumulated contour length. The degree of invariance of the model with respect to comparisons involving elements of very different shape, and therefore substantially different contour length, has not yet been systematically tested. 
Another model which can be seen to match the criteria, by Stoianov and Zorzi (2012), is based on unsupervised learning in a deep recurrent network. Neural-network models are very valuable insofar as they demonstrate that a capability like numerosity perception can indeed be learned by a biological system (Hornik, Stinchcombe, & White, 1989). The training images were binary and the elements presented were rectangles, triangles, or ovals, so that only moderate shape variations were investigated. Whether this model and its abstract mathematical counterpart (Cappelletti, Didino, Stoianov, & Zorzi, 2014; Stoianov & Zorzi, 2012) can provide the desired invariance properties for arbitrarily shaped elements thus remains to be determined. 
In our view, this situation suggests that a systematic account of the logical, mathematical, and computational requirements for a sense of number is highly desirable. A formal analysis through a hierarchy of abstraction similar to Marr's (1982) approach and a discussion of the relation to the philosophy of mathematics can be found elsewhere (Kluth & Zetzsche, 2015). Simplified early variants of the model which represent objects as “right-angled” polygons and make use of a generalized eigenvalue operation have also already been described (Zetzsche & Barth, 1990b; Zetzsche, Gadzicki, & Kluth, 2013). First results on the present version of the model have been presented in a conference paper (Kluth & Zetzsche, 2014). In this article the focus is on the invariance properties of the model, on its neural realization, and on its quantitative predictions regarding human behavior. 
The article is organized as follows. We start with a specification of the preconditions and the general problem statement, and present then the specific questions that have to be answered. In the following section we describe our approach to the problem: Numerosity should be considered as a mathematical invariant, and the zeroth Betti number, a specific topological invariant, should be considered as the ideal solution. We then derive a realistic approximation of the ideal solution by considering the image luminance function as a curved surface and by making use of the Gauss–Bonnet theorem to compute the number of simply connected components in the image. We next consider the neurobiological plausibility of the suggested computations and show that required hardware can be assumed to be available in the visual cortex. The model is then tested for a variety of different-shaped elements arranged in varying configurations. For these tests, we provide an extended data set, which will also be made publicly available, containing diverse shape variations. In the next section we investigate the relation of the model to human behavior. Reasonable assumptions about neural noise are shown to lead to valid predictions about behavior in different tasks and about the corresponding Weber fractions. The article closes with a discussion in which we compare our model to other suggested models, identify the different invariance properties of the models, and consider what testable predictions can be deduced from this comparison. 
Mathematical principles and models
Definition of the problem
The common mathematical basis for numbers is given in set theory. The cardinality of a set is closely related to our understanding of what is meant by numerosity, as it describes how many elements the set contains. But representations of numbers seem to be somewhat arbitrarily chosen, because numbers are just a formal construction to provide a label for the equivalence classes of the relation “having the same cardinality.” The definitions of the natural numbers and their Arabic numerals are both concepts which can have an effect on numerical cognition. However, they do not provide a basis for the mental representation of numbers, as species without access to these concepts are also able to estimate the number of objects in a configuration. Another problem is that the concept of cardinality in set theory is based on a clear distinction between the elements within a set, and this cannot be assumed as a general property of the perceptual process. As a consequence, set theory does not provide the tools and concepts to properly describe how the number of objects in the real world is derived by perceptual processes from basic properties of these objects. The biological representation of number derived from perceptual processes is also referred to as numerosity. 
In the context of the perception of numerosity we thus have to consider a perspective which is different from the abstract mathematical realm. Here we are confronted with the problem of inferring numerosity from the physical world. Constitutional aspects of this problem are the notions of space and object. We envisage space here as the continuous real-valued space ℜ3. We further assume that this 3-D space is not completely empty but contains a configuration of matter—i.e., at each position (x, y, z) we have either some matter of type mj (with multiple assignments not allowed) or empty space (vacuum). Is it possible to assign in a meaningful way a numerosity to this spatial configuration of matter? And what are the requirements for such an assignment? The critical concept here is the notion of an object. We think here of ordinary objects like dogs, trees, tables, and so on. But what makes up an object formally? At least since Descartes, a very common conception of a physical object (body) is that of a contiguous bounded region of matter in three-dimensional space. This region is distinguished from its surround. The approach of establishing objects by such a contiguous region of matter is also supported from a perceptual point of view by evidence from gestalt theory. The connectedness of pieces strongly affects grouping into one object (Palmer & Rock, 1994). Moreover, it has been shown that this kind of connectedness has an effect on numerosity estimation (Franconeri, Bemis, & Alvarez, 2009; He, Zhang, Zhou, & Chen, 2009). One critical issue here is how the distinction between object and surround is established. For the context of this article we will not further pursue this question; we will just assume that this is achieved in some reasonable fashion. An object is thus restricted to a connected subset of the real world, which means that it cannot be represented by a union of elements, i.e., disjoint subsets. 
The world in which we live can be assumed to be four-dimensional if we consider time as an additional dimension. Within the context of this article we restrict the world and the objects to be static such that they can be represented by and within a three-dimensional vector space, the real-valued space ℜ3. We then have our configuration of one or more contiguous regions of 3-D space, and we can assign a label—the objecthood label—to each point (x, y, z) of the interior of the region belonging to an object. We further assume that there exists something like a background in the form of the complement of the union of all object regions (in physical terms this could be air, water, vacuum, etc.), and we can assign the background label to all points of this background. For simplicity, let us also assume that any two objects can come arbitrarily close together but can never have full contact, so that there is always at least a tiny bit of background in between them. A configuration c(x, y, z) can then be defined by the associated objecthood function. In mathematical terms this means that if multiple objects Oi are given as subsets of ℜ3, building a configuration set C = ∪iOi, the objecthood function defining a configuration becomes the characteristic function with respect to C—i.e., c(x, y, z) = χC(x, y, z). Figure 1 shows an example for a possible configuration. 
Figure 1
 
A configuration of three-dimensional simply connected objects. The shown objects (mushrooms) differ in their position, size, and shape. The main question is how the numerosity of objects can be determined from such a configuration. (Note that in a strict manner, this illustration gives only an idea about the three-dimensional case since projection to a two-dimensional image amounts to estimation via sensors). There exist multiple possible approaches for how the numerosity can be estimated. If time is not limited, one may sequentially count the objects. For the case of up to four objects, the number can be determined immediately in a process named subitizing. We want to deal with the full range of numbers, and we therefore investigate the approximate number sense, which estimates numerosity under a strict time limitation.
Figure 1
 
A configuration of three-dimensional simply connected objects. The shown objects (mushrooms) differ in their position, size, and shape. The main question is how the numerosity of objects can be determined from such a configuration. (Note that in a strict manner, this illustration gives only an idea about the three-dimensional case since projection to a two-dimensional image amounts to estimation via sensors). There exist multiple possible approaches for how the numerosity can be estimated. If time is not limited, one may sequentially count the objects. For the case of up to four objects, the number can be determined immediately in a process named subitizing. We want to deal with the full range of numbers, and we therefore investigate the approximate number sense, which estimates numerosity under a strict time limitation.
Having specified the formal conditions so far, we now address the following questions: 
  1.  
    How can we assign a numerosity to this configuration in a mathematically reasonable way?
  2.  
    How can we compute this numerosity from visual measurements?
  3.  
    How can this computation be realized with the available neural hardware of the human visual cortex?
  4.  
    How does all this relate to the human perception of numerosity?
Numerosity is an invariant
As a first step we consider the mathematical problem behind the number-assignment problem addressed by the first question. Let us consider the set 𝒞, which has as elements all possible configurations c(x, y, z)—i.e., all the configurations of objects in the sense specified earlier. The relation ∼ defines equinumerosity: Two configurations are equinumerous if each has the same numerosity. The set 𝒞 is then structured into subsets according to the equinumerosity relation ∼, which means that each subset is an equivalence class with respect to ∼. Let us define the property N(c(x, y, z)) as the numerosity of objects in the configuration c. Since it is true that whenever cicj, then N(ci) = N(cj) for two configurations ci and cj, the property N—the numerosity—is an invariant under the relation ∼. Note that the fundamental difference from the classical set-theoretical approach is as follows. Given a configuration c with respect to a union of objects Oi, we only have access to C = ∪iOi. But determination of cardinality in a set-theoretical manner would require access to {O1, …, On}—i.e., an explicit distinction between the objects. 
As next step we have to take a closer look at the specific nature of the invariant N. Invariant properties are usually specified with respect to transformations T which map an object configuration to another one. Which transformations T have to be considered? The simplest class are obviously changes of position (Figure 2a), and of course the numerosity N(c(T[x, y, z])) should not be influenced by such transformations T. The same is true for other geometric properties, like size and orientation changes (Figure 2b, c), and in general every affine geometric transformation T should be admissible while leaving the number value invariant. However, geometric transformations are not sufficient, since in addition to changes of the geometric properties of the objects it should also be possible to change their shapes. Is there a class of transformations which enables an arbitrary change of the shapes of the objects? And what is the appropriate mathematical formalism that can provide suitable invariants with respect to this class of transformations? 
Figure 2
 
The different kinds of invariance properties required for numerosity estimation are illustrated in the two-dimensional case. The right column shows the combination of different operations, with increasing complexity from top to bottom. The kinds of operations are partitioned in two groups: geometric invariances and topological invariances. Geometric invariances are defined with respect to operations which change position, size, and orientation of the objects. Topological invariances are defined with respect to operations which change the shape of the object (formally defined with respect to homeomorphisms). Each topological invariance is automatically a geometric invariance.
Figure 2
 
The different kinds of invariance properties required for numerosity estimation are illustrated in the two-dimensional case. The right column shows the combination of different operations, with increasing complexity from top to bottom. The kinds of operations are partitioned in two groups: geometric invariances and topological invariances. Geometric invariances are defined with respect to operations which change position, size, and orientation of the objects. Topological invariances are defined with respect to operations which change the shape of the object (formally defined with respect to homeomorphisms). Each topological invariance is automatically a geometric invariance.
We suggest that the appropriate formalism is provided by topology. Loosely stated, topology is the mathematical discipline that deals with those problems that do not depend on the exact shape of the objects and configurations involved. The topological structure of the support of a configuration (or more generally, a topological space) is described by the series of Betti numbers. The kth Betti number is the rank of the kth simplicial cohomology group. A more intuitive interpretation of these numbers is that they measure the number of k-dimensional holes of the space. The zeroth Betti number b0 is the most interesting one with respect to numerosity estimation, as it measures the number of connected components. Each configuration consists of multiple contiguous objects so that the zeroth Betti number of the support C is equivalent to the number of objects—i.e., N(c(x, y, z)) = b0(C). 
As a side remark, it is of interest whether the proposed invariant is a topological invariant. In this case, the transformations would be restricted to homeomorphisms—i.e., continuous and bijective mappings whose inverse is also continuous. This class of operations not only includes the geometric transformations illustrated in Figure 2a through c but also allows the complete change of object shapes as illustrated in Figure 2d. Homeomorphisms are structure conserving in the sense that the kind of connectedness does not change. Cutting, tearing, and gluing of objects, for example, are no homeomorphisms in general and could cause a change of a topological invariant. The proposed invariant, the zeroth Betti number, is a topological invariant insofar as it does not change under homeomorphisms. However, in a sense the zeroth Betti number can be seen as a stronger invariant, as it does not change its value for a broader class of transformations, for example the mapping from a sphere to a torus. This distinguishes it from another weaker topological invariant, the Euler characteristic, which we will later consider in more detail. The most general requirement for the invariance of the zeroth Betti number is only that each object remain somehow connected. 
Now that we have specified the formal solution to the problem with the concept of the zeroth Betti number, we next have to consider the computational aspects: What kind of algorithms exist for the computation of b0? How can we apply them to real sensory data? In answering these questions we will first proceed through some more abstract levels before finally deriving a concrete model for estimating numerosity from a discrete two-dimensional intensity image. 
We start with the consideration of algorithms. In digital topology—i.e., the research area dealing with the computation of topological properties in image processing—there exists a class of closely related algorithms referred to as connected component labeling. As the name states, the algorithms are based on a labeling strategy, and the number of connected components results as a by-product of the number of different labels that have been assigned on termination of the algorithm. The main problem in making use of these algorithms in the context of numerosity is that our current knowledge of numerosity perception assumes a parallel, almost instantaneous process for its computation. To our knowledge, there exist no hints toward a specific temporal dependence on reaction times for numerosity estimation, with the only exception of sequential or mental counting, which is assumed to rely on different mechanisms (Mandler & Shebo, 1982; Trick & Pylyshyn, 1994). However, all known algorithms from digital topology for computing the number of connected components have a substantial serial component. Many of them show a direct run-time dependence on the number of objects (see, e.g., He, Chao, Suzuki, & Wu, 2009), whereas it is a hallmark of numerosity perception that the reaction times are independent of the number of objects in the stimulus. At present it is thus not possible to determine the number by a connected-component-labeling algorithm which is compatible with our current knowledge on numerosity perception and the neural architecture of the visual cortex. (Note that this does not rule out the possibility that new insights on the former or the latter, or both, will turn up in the future. The issue should thus not be removed from the research agenda.) 
So how can we derive a neurobiologically realistic—that is, parallel—algorithm for the computation of numerosity? For this, we have to accept a minor restriction with respect to the most general, ideal solution. An important special case results from the assumption that all objects are approximately simply connected—i.e., objects have no holes. In this case the zeroth Betti number equals another important topological invariant, the Euler characteristic. Although the Euler characteristic is a somewhat weaker invariant, due to being a topological invariant it will remain constant if any homeomorphism is applied to a simply connected object. This allows us to assign a constant value to a wide variety of different shapes (cf. Figure 2). This invariant is of special importance in the current context, since there exists a famous theorem which allows its parallel computation from local measurements. 
The Gauss–Bonnet theorem provides a connection between topology and differential geometry and relates the Gaussian curvature of two-dimensional surfaces to their Euler characteristic. It thus becomes possible to represent the number in terms of the Euler characteristic. We first provide the theorem in its basic form for the case of simply connected objects, which we assume to have closed smooth surfaces: 
Theorem 1 (Gauss–Bonnet, basic version) Let S ⊂ ℜ3 be a closed surface (of class C3) without boundary; then  where K is the Gaussian curvature and χ is the Euler characteristic.  
How can we understand the invariance properties provided by the Gauss–Bonnet theorem? The Gaussian curvature K of a regular surface is defined as the product of its principal curvatures, the minimal and maximal normal curvatures of two orthogonal planes being both orthogonal to the tangent plane of the surface. The local shape of a surface can be distinguished by its Gaussian curvature in elliptic (K > 0), hyperbolic (K < 0), and parabolic (K = 0) parts. For general shapes, the invariance is provided by (a) a trade-off between area and curvature, (b) a trade-off between elliptic and hyperbolic regions, and (c) the fact that parabolic regions never produce any contribution, independent of their spatial extension. 
Let us consider, as a special and particularly simple case, the surface S of a sphere, as shown in Figure 3. It is completely elliptic (light blue); the Gaussian curvature is constant everywhere and depends only on the radius of the sphere. The interplay between the surface area and the Gaussian curvature is just a multiplication. If the radius is decreased (top of Figure 3b, c), the surface area decreases while at the same time the Gaussian curvature increases. The resulting surface integral—i.e., the product of Gaussian curvature and surface area—is always 4π, independent of the size of the sphere (the Euler characteristic χ(S) of any sphere is 2). Note that the deformations of the sphere are exactly the topological transformations previously considered. As decreasing the radius of a sphere is a homeomorphism, Theorem 1 states that the integral over the Gaussian curvature is constant. The surface integral also does not change if the sphere is dented, as can be seen in the middle row of Figure 3b and c. Here a hyperbolic (red) curvature emerges at the transition region between sphere and dent. Since the curvature does not change except in the region of the dent, curvature with a negative sign always implies the emergence of another elliptic part with a higher absolute value. We can find this elliptic part in the middle of the dent, which can be seen in Figure 3c. The Gaussian curvature of a regular surface is a continuous function and thus there exists at least a curve between hyperbolic and elliptic parts on the surface where the curvature is parabolic. The influence of a homeomorphism producing a bulge on the surface of a simply connected object and its Gaussian curvature is also shown in Figure 3
Figure 3
 
(a) The surface of a three-dimensional sphere. (b) The sphere in (a) is transformed via homeomorphisms into a smaller sphere (top), a smaller sphere with a dent (middle), and a smaller sphere with a bulge (bottom). (c) The Gaussian curvature of the 3-D objects in (b), shown qualitatively by coloring the surface by the kind of curvature. The elliptic parts of the surface are blue, the hyperbolic parts are red, and the parabolic parts are white. The color saturation shows the amount of curvature contributed to the integral in the Gauss–Bonnet theorem. This illustrates the meaning of the Gauss–Bonnet theorem. The distribution of the Gaussian curvature of the surface is strongly affected by the applied homeomorphism, but the integral over the whole surface of this curvature is independent of the homeomorphism's effect. (Adapted from Kluth & Zetzsche, 2014.)
Figure 3
 
(a) The surface of a three-dimensional sphere. (b) The sphere in (a) is transformed via homeomorphisms into a smaller sphere (top), a smaller sphere with a dent (middle), and a smaller sphere with a bulge (bottom). (c) The Gaussian curvature of the 3-D objects in (b), shown qualitatively by coloring the surface by the kind of curvature. The elliptic parts of the surface are blue, the hyperbolic parts are red, and the parabolic parts are white. The color saturation shows the amount of curvature contributed to the integral in the Gauss–Bonnet theorem. This illustrates the meaning of the Gauss–Bonnet theorem. The distribution of the Gaussian curvature of the surface is strongly affected by the applied homeomorphism, but the integral over the whole surface of this curvature is independent of the homeomorphism's effect. (Adapted from Kluth & Zetzsche, 2014.)
The invariant provides the desired property that each object, irrespective of its shape, will generate the same contribution. But this is not sufficient, since for measuring the number of objects of a configuration we also want the contributions of all objects to be summed up. The linearity of the integral operator directly yields this important property, the additivity of the invariant. Assume Image not available to be the union of surfaces of pair-wise disjoint simply connected objects Oi. With χ(∂Oi) = 2, i = 1, …, n, the integral becomes    
The invariant of a configuration of n objects thus amounts to just n times the basic invariant. We can therefore sum up all local curvature measurements of a configuration of simply connected objects without any further a priori knowledge to obtain the number of objects of the configuration. In principle, the problem of estimating the number of objects in a scene is thereby solved. 
Sensor implementation
The challenging part of the information-gathering process is thus how the local Gaussian curvature can be estimated by a sensory modality. Tactile scanning of the object's surface and visual sensing are both possible sensor strategies. Multisensory approaches also seem possible, and with technical depth-sensing devices, as used in robots, the number of possibilities becomes quite high. However, in this article we restrict our investigation to the human visual system. Here we can distinguish two basic possibilities. First, we can assume the existence of some sort of 3-D representation at an intermediate stage of the visual system. Then it would in principle be possible to derive the local Gaussian curvature of the surface of an object from this 3-D representation and make direct use of Theorem 1 for computing numerosity. In our view, this is a possibility that should indeed not be neglected. However, the representation of 3-D information in the visual system is not yet completely understood, and the additional processing time seems to be in conflict with the very fast reaction times at which numerosity can be estimated. We thus consider here a second, more plausible possibility: direct estimation of numerosity from the 2-D luminance function l = l(x, y). 
Given that the solution by the Gauss–Bonnet theorem is specified in terms of surfaces in 3-D space, making use of it for 2-D luminance images requires looking at two aspects. First, we have to consider the relation between the 3-D spatial configuration and the resulting 2-D image. And second, we have to investigate how the formalism can be adapted to the processing of 2-D luminance functions. An example for the luminance function of a weakly lightened 3-D cube is illustrated in Figure 4a. The interplay between lighting and the reflectance properties of the object's surface results in a mapping from the real world to the sensed 2-D luminance function. The details of this mapping have been investigated by, e.g., Phong (1975) and Schlick (1993), and its relation to human perception has been analyzed by Koenderink and van Doorn (2003). However, for our investigations we consider only basic versions of this mapping. The sensor operator G maps this physical properties of a configuration c to the luminance function Image not available such that the sensory process becomes G(c) = l. With the simplest assumption that G is an orthogonal projection not incorporating lighting, the resulting luminance level is always constant for the objects. If we assume planar surfaces to be projected this way, the resulting setup matches the visual stimuli commonly used in psychological studies (e.g., He, Zhang, et al., 2009; Piazza, Izard, Pinel, Le Bihan, & Dehaene, 2004). In particular, the case for objects resulting in right-angled polygons as projections was considered in previous works (Zetzsche et al., 2013; Zetzsche & Barth, 1990b).  
Figure 4
 
(a) A slightly lightened three-dimensional cube. (b) The luminance function of the image in (a) as a two-dimensional surface embedded in the three-dimensional space. Application of differential operators to the luminance surface becomes numerically unstable because of emerging discontinuities which require the incorporation of additional arcs in the general solution.
Figure 4
 
(a) A slightly lightened three-dimensional cube. (b) The luminance function of the image in (a) as a two-dimensional surface embedded in the three-dimensional space. Application of differential operators to the luminance surface becomes numerically unstable because of emerging discontinuities which require the incorporation of additional arcs in the general solution.
However, we also want to consider the case in which G takes lighting of the objects into account (such that it does not result in a constant luminance level of the objects' projections). For this, we additionally assume that the operator G preserves the differentiability on the objects' surface. The configuration c is thus mapped to an almost-everywhere sufficiently smooth function l. The background is assumed to be uniform and clearly separated from the objects. The discontinuity between objects and the environment is thus projected to the luminance function (compare Figure 4a). 
With these assumptions it becomes easy to apply the Gauss–Bonnet theorem to the luminance function in order to estimate the numerosity of perceived objects—i.e., the Euler characteristic as related to local features of the luminance function. The only conceptual step needed is to consider the luminance surface Ω ⊂ ℜ3, which is defined by the perceived luminance function l—i.e., Ω:= {(x, y, z) ∈ ℜ3|x, y ∈ ℜ, z = l(x, y)}. For example, in Figure 4b the luminance surface of the luminance function in Figure 4a is shown. 
However, because the luminance surface of an object is no longer a closed surface, but has a boundary, we have to consider the complete version of the Gauss–Bonnet theorem: 
Theorem 2 (Gauss–Bonnet, complete version) Let S ⊂ ℜ3 be a regular oriented surface (of class C3), and let R be a compact region of S with the boundary ∂R. Suppose that ∂R is a simple, closed, piecewise regular, positively oriented curve. Assume that ∂R consists of k regular arcs ∂Ri (of class C2), and let θi be the external angles of the vertices of ∂R. Then  where K is the Gaussian curvature, κg is the geodesic curvature, and χ is the Euler characteristic.  
Theorem 2 provides a solution which incorporates the luminance surface properties directly. Assuming a smooth space curve Γ which is the boundary of a closed region S on the luminance surface Ω—i.e., Γ = ∂S—we have  where κg denotes the geodesic curvature and the Gaussian curvature K is given by  where the subscript letters denote the derivative in the respective direction. The consideration of the space curve Γ is necessary because the integral of the Gaussian curvature over the whole luminance surface of arbitrary luminance functions would result in the same quantity. This means that in this case all images have the same Euler characteristic (Barth, Ferraro, & Zetzsche, 2001). Using the parametrization ϕ(x, y) := (x, y, l(x, y))T of the surface S ⊂ Ω and the Gaussian curvature given by Equation 5, the first integral can be computed by  where χS is the characteristic function with respect to the set S. In order to calculate the second integral, the geodesic curvature of the boundary curve Γ must be estimated. Using the parametrization θ : I → ℜ3 with θ(t) := (x(t), y(t), l(x(t), y(t)))T of the boundary curve and the assumption l(x(t), y(t)) = τ is a constant ∀ tI, we can determine the geodesic curvature by    
The parameter τ is considered as a threshold in the remainder of this article. The additional assumption of constant height allows us to eliminate the second derivatives of the parametrization. The second integral in Equation 4 thus becomes  where χΓ is the characteristic function with respect to the set Γ. Assuming constant height, the first derivative of the parametrization yields x′ = −(ly/lx)y′. Consequently, the geodesic curvature depends only on differential operators applied to the luminance function and one first derivative x′ or y′ of the parametrization of the boundary curve. Finally, the numerosity of objects N from a union S of pairwise disjoint regions of luminance surfaces can be determined by  where Image not available and Image not available . The computation now depends on three quantities , χ̃S, and κ̃g depending on surface properties and the quantity χ̃C, which has to extract curve properties. All quantities have in common the fact that they represent local properties of the luminance surface.  
Regarding the implementation, images are assumed to be positive real-valued matrices. In order to replace the continuous operators by their discretized approximations, we have to guarantee sufficient smoothness on the luminance surface. If this is not satisfied, high numerical errors can emerge. In order to obtain differentiability in the discrete representation, each stimulus is additionally low-pass filtered using a Gaussian kernel g : [−1, 1] × [−1, 1] → ℜ with fixed standard deviation σ = 0.04 for all computations considered within this article (if not otherwise mentioned). The images are assumed to be defined by functions l : [−1, 1] × [−1, 1] → [0, 1]. The discretization assumed to be done with a step size of Δx = Δy = 1/50 results in image matrices with 100 × 100 pixels. The threshold height τ is determined by half the maximum luminance value within an image. 
Neural implementation
A crucial question which immediately arises for any model being based on heavily theoretical arguments is whether such a model is still neurobiologically plausible. We show that the computation relies on four important properties which have been reported to be already available in the early visual cortex (up to V2): orientation selectivity—a basic property of linear filter functions in V1 (Daugman, 1985); 2-D selectivity—selectivity to curved features and 2-D features has been reported for V1 and V2 (Hashemi-Nezhad & Lyon, 2012; Ito & Komatsu, 2004); gain control—an essential normalization property of networks (Carandini & Heeger, 2012); and integration—spatial summation as a special case of a linear filter operation. We derived a possible architecture for the implementation of Equation 9, which is illustrated in Figure 5. The processing direction is from left to right. First the input is filtered with a 2-D Gaussian kernel. The filtered input is then used to compute the first- and second-order derivatives of the luminance function as well as the threshold value used to control the boundary curve. The computation of the derivatives builds the second path (blue block “Linear filter”) with its origin in the filtered input image. The threshold value in the model depends on the maximum of the whole Gaussian-filtered luminance function. Given the derivatives of the luminance function, a bunch of multiplications must be realized; compare the red block “GC-Product” in Figure 5. The resulting products then must be summed (blue block “GC-Sum”). The output of this stage is fed into a ratio-computation stage (block “GC-Norm”). After this stage, we finally have two local nonlinear features available, (x, y) and κ̃g(x, y). These are assumed to be computed across the whole visual field but are selectively gated by the threshold mechanism such that κ̃g values are only passed at the boundary locations, and values are only passed within the interior regions to the final global summation stage (blue block “Integration”). This global summation provides the estimate of the numerosity of the input pattern. 
Figure 5
 
A possible architecture of the proposed model and how error is introduced in the system. The information-processing direction is from left to right; each box represents one processing step. The processing steps in which noise is included have a slightly darker background color. The open circle in each box is a placeholder for the input coming from the left direction. Linear operation stages are blue, and nonlinear ones are red. The stimulus is fed in the system as a luminance function l in the input stage (green box). Then the additive normally distributed noise ηinN(0, σin) is added. The convolution with a Gaussian kernel g and application of the differential operator can be interpreted as one linear-filter processing stage—i.e., the convolution with the respective derivative of the Gaussian kernel. The results of the linear filter operation are fed into the next noise-adding unit, where different samples of ηlinN(0, σlin) are added to each input. The following multiplication (GC-Product), summation (GC-Sum), and division (GC-Norm) build the cortical gain-control stage, which results in an estimate of the required curvature quantities. Finally, the curvature quantities are integrated (blue box), resulting in an estimate for the numerosity N.
Figure 5
 
A possible architecture of the proposed model and how error is introduced in the system. The information-processing direction is from left to right; each box represents one processing step. The processing steps in which noise is included have a slightly darker background color. The open circle in each box is a placeholder for the input coming from the left direction. Linear operation stages are blue, and nonlinear ones are red. The stimulus is fed in the system as a luminance function l in the input stage (green box). Then the additive normally distributed noise ηinN(0, σin) is added. The convolution with a Gaussian kernel g and application of the differential operator can be interpreted as one linear-filter processing stage—i.e., the convolution with the respective derivative of the Gaussian kernel. The results of the linear filter operation are fed into the next noise-adding unit, where different samples of ηlinN(0, σlin) are added to each input. The following multiplication (GC-Product), summation (GC-Sum), and division (GC-Norm) build the cortical gain-control stage, which results in an estimate of the required curvature quantities. Finally, the curvature quantities are integrated (blue box), resulting in an estimate for the numerosity N.
The architecture requires the computation of several linear and nonlinear features across the visual field. Nominally, this results in a much higher computational complexity than is required for the models of Dakin et al. (2011) or Stoianov and Zorzi (2012). However, a comparison must also take into account the existing structure of the visual cortex and the generality of the model. Regarding the first aspect, we describe later in detail how basically all required computations are known to exist in visual cortex (V1 and V2). In particular, there is an overcomplete representation of the input (Barlow, 2013), which includes not only a complete orientation-selective decomposition but also a provision of divisive normalization and selectivity for curvature. In this sense, the proposed model does not imply a significantly higher computational complexity than is assumed to exist in visual cortex. 
Nevertheless, according to Occam's razor a simpler model always has to be preferred if it has the same explanatory power. This brings us to the second aspect that has to be considered in the comparison of models with respect to their computational complexity. We argue in this article that the invariance desirable for the computation of numerosity should include an invariance to shape properties. The invariance provided by the models of Dakin et al. (2011) and Stoianov and Zorzi (2012) is more restricted than the invariance of our model (see Discussion). It is clear that with restricted demands on invariance it is possible to find a computationally simpler algorithm. What is unclear is which degree of invariance is provided by the human system. The proper application of Occam's razor thus becomes an empirical question about the invariance properties of human numerosity perception. 
Now let us consider the neurophysiological plausibility of the computations in our model in a step-by-step fashion. The Gaussian filtering operation is commonly assumed to be available from the earliest stage of the visual system; in particular, it could be realized in the ganglion cells of the retina (Kuffler, 1953; Marr & Hildreth, 1980). The key operations in the model are directional derivatives of first and second order (combined with the Gaussian filter for regularization). Translated into receptive-field properties, these are orientation-selective mechanisms with odd and even symmetry. It is well known that Gaussian derivatives are well suited for the description of neurons in the primary visual cortex (e.g., Koenderink & van Doorn, 1990; Lindeberg, 2013; Marr & Ullman, 1981; Martens, 1990; Young, 1987; Young & Lesperance, 2001). In particular, it has been argued that they can be approximated in a plausible fashion as difference-of-offset Gaussian filters (Young, 1987; Young & Lesperance, 2001). 
The second important type of computation in the model is the nonlinear extraction of curvature. Selectivity to curvature is a well-known basic selectivity of cortical neurons (Dobbins, Zucker, & Cynader, 1987; Zetzsche & Barth, 1990a, 1990b). In addition to the aforementioned linear filtering operations, this requires nonlinear AND-like multiplication of two signals. One possibility is that this could be directly realized in the dendritic tree of a neuron (Koch & Segev, 2000; Mel, 1993). Alternatively, combinations of subunits could be used to realize the Babylonian trick and compute the product by the sum of squared spatial-filter outputs (Adelson & Bergen, 1985; Resnikoff & Wells, 2015; Zetzsche & Barth, 1990a)—i.e., ab = (1/4)[(a + b)2 − (ab)2]. Finally, it can be shown that the nonlinear mechanism of cortical gain control can be combined with subsequent nonlinear transducer functions to realize an AND-like interaction (Zetzsche & Nuding, 2005). It has thus been argued that curvature-selective operators can be realized without problems by the available cortical hardware (Zetzsche & Barth, 1990a, 1990b). Furthermore, it has been shown that certain forms of end stopping, of the hypercomplex property, and of extraclassical receptive-field properties show a close relation to the computation of curvature (Zetzsche & Barth, 1990a, 1990b; Zetzsche & Nuding, 2005; Zetzsche & Roehrbein, 2001). Even evidence for the existence of an isotropic variant is provided by so-called dot-responsive cells (Saito, Tanaka, Fukada, & Oyamada, 1988). For this, the involved products must be summed (blue block “GC-Sum”), but this is an undisputed ability of neurons. 
The output of the basic curvature-selective computation (the numerator) is fed into a divisive operation (GC-Norm). That neurons can implement such an operation is shown by the established role of the divisive normalization mechanism in models of the visual cortex (Abbott, Varela, Sen, & Nelson, 1997; Carandini & Heeger, 2012; Geisler & Albrecht, 1992; Heeger, 1992; Schwartz & Simoncelli, 2001). It has even been argued that such an operation should be regarded as a “canonical” neural computation (Carandini & Heeger, 2012). The gain-control layer, which is illustrated by the rightmost red box in Figure 5, results directly in an estimate for the required curvature quantities (x, y) and κ̃g(x, y). 
Although it can be assumed that these features are computed throughout the visual field, the implementation of Equation 9 requires a selective gating before feeding them into the global summation stage. The first part of this process is the determination of a level for the boundary curve. For simplicity, we have here assumed that this is achieved by selecting the 50% level of the maximum luminance value. Neural computation of the maximum is a well-known principle discussed in the context of winner-take-all networks (Kaski & Kohonen, 1994; Mead & Ismail, 2012) electing the boundary which determines the region where the surface curvature operations setting the value for the Maximum operations are also routinely used in models of the visual cortex (Lampl, Ferster, Poggio, & Riesenhuber, 2004; Serre, Wolf, & Poggio, 2005). However, we do not put special weight on the use of this principle since, for the successful application of Equation 9, the specific method used to determine a boundary curve plays no crucial role. It thus might as well be based on some other mechanism, e.g., on the determination of some equilibrium level (Dayan & Abbott, 2001; Grossberg, 1988). Once the boundary is selected, what remains to be achieved is the gating operation—i.e., which values are passed to the global integration mechanism must be controlled. This type of operation is a special case of the general principle of neural gain modulation, which is an essential neural mechanism in sensorimotor processing (Salinas & Thier, 2000) and is also used in the visual cortex (Reynolds & Chelazzi, 2004). 
To summarize: 
  •  
    The curvature operators and κ̃g depend only on derivatives of the luminance function which can be realized by neurophysiologically realistic orientation-selective filters.
  •  
    The nonlinear multiplicative AND-like combination of these features can be computed by a variety of plausible neural mechanisms.
  •  
    The ratio operations are closely related to the well-established principles of cortical divisive normalization and gain control.
  •  
    The outputs of the curvature operators are globally integrated in a gated fashion, which is also an established neural principle, known as gain modulation.
The computation of Equation 9 can thus be achieved by the available neural hardware in the visual cortex. 
Basic properties
The model's behavior for stimuli commonly used in vision research is illustrated in Figure 6. More complex stimuli are shown in Figure 7. Variations of elements and configurations for which the model has been tested include cumulative area, morphing, convex–concave properties, luminance, mixtures of element sizes, shape variations, and texture properties. The full test set is much larger than shown here and is described in the Appendix
Figure 6
 
Various stimuli (size 100 × 100 pixels) and the corresponding responses of the model defined by Equation 9. Stimuli vary with respect to different criteria like (a) cumulative area, (b) number change by morphing, (c) convex and concave shape, (d) contrast of a single object, and large numbers of (e) convex and (f) concave objects.
Figure 6
 
Various stimuli (size 100 × 100 pixels) and the corresponding responses of the model defined by Equation 9. Stimuli vary with respect to different criteria like (a) cumulative area, (b) number change by morphing, (c) convex and concave shape, (d) contrast of a single object, and large numbers of (e) convex and (f) concave objects.
Figure 7
 
Various stimuli (size 100 × 100 pixels) and the corresponding responses of the model defined by Equation 9 (N2: σ = 2Δx; N4: σ = 4Δx; Nt: σ = 2Δx; and the threshold τ is 0.3 times the maximum luminance value). Stimuli vary with respect to different criteria like (a) grating texture, (b) convex and concave object morphing, increasing number of convex and concave (c) binary and (d) grating-textured objects, (e) textured concave objects, and (f) contrast variations of multiple rectangles.
Figure 7
 
Various stimuli (size 100 × 100 pixels) and the corresponding responses of the model defined by Equation 9 (N2: σ = 2Δx; N4: σ = 4Δx; Nt: σ = 2Δx; and the threshold τ is 0.3 times the maximum luminance value). Stimuli vary with respect to different criteria like (a) grating texture, (b) convex and concave object morphing, increasing number of convex and concave (c) binary and (d) grating-textured objects, (e) textured concave objects, and (f) contrast variations of multiple rectangles.
The model shows a strong invariance with respect to the cumulative area of objects (Figure 6a). Furthermore, it is also invariant to changes in object shape from convex to concave (Figure 6c). Figure 6b shows a circle morphed into two circles. The model only shows a strong change in response when the two circles are not connected anymore (bottom stimulus in Figure 6b; the slight change from the top stimulus to the middle one is due to numerical instability at the connecting points of the contours). This selectivity with respect to numerosity is one of the most important properties of the model, and it is also present for larger numbers of objects (Figure 6e, f). 
By definition, the proposed model is invariant to contrast to some degree. The threshold, which is defined with respect to maximum contrast, causes the invariance illustrated in Figure 6d. The slight changes are caused by numerical differences in the curvature computation, which depends on the absolute level of luminance. Contrast invariance is also present for multiple objects, as can be seen in the top and middle stimuli of Figure 7f. The bottom stimulus shows a configuration of objects with different contrast. In this case the threshold to determine the integration domain was chosen to be smaller, i.e., 30% of maximum luminance (denoted by Nt). The invariance with respect to contrast is preserved, as the model output changes only slightly. Note that the model can only be contrast invariant to some degree. The lower the threshold, the higher the influence of the standard deviation σ of the Gaussian filter. This means that the minimum spatial distance between two objects that is necessary to distinguish them in terms of numerosity is increased. One should find a systematic underestimation in the numerosity estimation when using stimuli which include a mixture of low- and high-contrast objects spatially placed with the critical distance for constant contrast. For further illustrations of illuminated 3-D objects and threshold-dependent responses, we refer the reader to Kluth and Zetzsche (2014). 
We also tested the model on more complex stimuli, including various numerosities of convex and concave objects with different luminance patterns (constant, sinusoidal gratings, and texture; compare Figure 7a through e). In all cases with piecewise constant luminance patterns, the model (N = N2), which uses the parameters that are also used for further investigations within this article, works well and yields the desired selectivity and invariance properties. In a few cases the model does not work optimally. In particular, the introduction of sinusoidal gratings (compare Figure 7a [middle], b [bottom], and d) or texture (compare Figure 7e) on the objects results in larger deviations from the expected model output. Such more sophisticated luminance patterns have a larger spatial area, for which a certain degree of regularity is required for differentiation. If the regularity is not given in this area, the size of the area determines the amount of error in the computation of the differentials. We tested the influence of this error by increasing the standard deviation of the Gaussian filter to increase the regularity of the differentiated signal. By using twice the standard deviation (N4; σ = 4Δx) as before (N = N2; σ = 2Δx), we make the model (N4) a perfect enumerator (deviations smaller than 0.02) for constant as well as sinusoidal luminance patterns, as can be seen in all cases of Figure 7a through d. The deviations in the texture case are slightly higher, which is due to the higher degree of regularity required on the objects. Further increasing the standard deviation solves this issue, but could theoretically cause a spatial pooling of close objects or an exclusion of small objects. For σ = 6Δx and a threshold of 30% maximum luminance, the outputs for the middle and bottom stimuli of Figure 7e become 4.0145 and 7.0336. There is no indication for a spatial pooling effect in this case, and the model output is improved by using these parameters. 
That connectedness matters has also been shown by Franconeri et al. (2009) and He, Zhang, et al. (2009). In Figure 8a, the kinds of stimuli used in this study and the corresponding model responses are illustrated. A line connecting the circles strongly affects the response of the model such that the response is reduced with every inserted line from the top to the bottom stimulus. This implies that the perceived numerosity should be underestimated by the introduction of connecting lines. This is in line with the findings reported by He, Zhang, et al. (2009). The stimuli are compared to stimuli with a larger image size (compare Figure 8a through c). The relative size of the circles is constant for all stimuli, but the relative width of the connecting lines varies from Figure 8a to c. In each column we can observe an approximately constant difference in response from top to bottom, which illustrates the desired selectivity of numerosity estimation with respect to the number of objects present in the stimulus. The model responses are constant in each line, which is due to the increase of σ with respect to the image size. As a result, numerical errors in the gradient computation are sufficiently small. But the increase of σ also implies that the sensitivity to the connecting lines emerges up to a certain line width only. The thinner the connecting lines, the less influence they have. 
Figure 8
 
Various stimuli of three circles with different numbers of connecting lines and the corresponding model responses (Nk: σ = kΔx, k = 4, 6, 8). The image size is varied from (a) 100 × 100 pixels to (c) 200 × 200 pixels. The relative size of the circles is constant. The absolute width of the connecting lines is also constant, resulting in a decrease in relative width to circle size.
Figure 8
 
Various stimuli of three circles with different numbers of connecting lines and the corresponding model responses (Nk: σ = kΔx, k = 4, 6, 8). The image size is varied from (a) 100 × 100 pixels to (c) 200 × 200 pixels. The relative size of the circles is constant. The absolute width of the connecting lines is also constant, resulting in a decrease in relative width to circle size.
Simulation experiments on Weber fraction
In the following we extend the proposed model such that noise is taken into account. We modeled additive normally distributed noise ηinN(0, σin) at the input and neural noise ηlinN(0, σlin) at the linear filter outputs. Both curvature operators and κ̃g are influenced by both types of noise. The structure of the curvature operators as products of noise-affected variables is a prototypical configuration for yielding a log-normally distributed quantity (Buzsáki & Mizuseki, 2014). This noise behavior has been reported in several studies regarding approximate-number estimation. Figure 5 illustrates the way noise influences the system which computes the Gaussian and the geodesic curvature. The digital quantity described by the perfect model thus becomes an analog quantity from which the desired information must be extracted. In the following, we focus on two different standard tasks to analyze the proposed model and to compare its performance with behavioral results from experiments with humans. The first task is discrimination whether it is a specific number or not (same–different), and the second is deciding whether a stimulus is larger than a reference stimulus (larger–smaller; Piazza et al., 2004). 
All investigations are based on four synthetic data sets of binary images generated in a similar fashion as that described by Stoianov and Zorzi (2012). The 100 × 100–pixel binary images contained up to 32 randomly placed nonoverlapping objects with variable surface area and shape. The cumulative area of the objects was varied from 288 to 2,304 pixels with a step of 288, resulting in eight levels of cumulative area. For each tuple of number and cumulative area, 200 images were computed. The data sets thus consist of 51,200 100 × 100 binary images, with 32 different numbers and eight levels of cumulative area. The data sets are distinguished by the kind of shape of the elements and whether they are for training or testing. One data set (R) has rectangular objects, as used for the analysis by Stoianov and Zorzi (2012), and the other ones consist of rectangular objects with smoothed corners (RC) and of circular objects (C), which are commonly used in behavioral experiments. All are used as test sets. We generated a fourth dataset (TR) consisting of rectangles to obtain the parameters for the optimal estimators. In addition, data sets with concave objects (C1, C2, C3) and randomly shaped objects (RA, RA200, RA400) were evaluated. More details and sample images of the data sets can be found in the Appendix
We trained optimal estimators tuned to a specific numerosity Ntuned and a fixed task. Detailed information about the optimal estimators can be found in the Appendix. See Table A4 in the Appendix for an overview of the trained parameters given the noise levels which are motivated later. The joint relative frequencies of the true numerosities and the response of the estimator were obtained from three distinct data sets (rectangles [R], smoothed corners [RC], circles [C]). We then fitted a continuous function to the conditional relative frequencies of the estimator's response given the true numerosity to obtain the internal Weber fraction. Further technical details can be found in the Appendix
Before we start considering noise in the system, we investigate the output of the noise-free model on each data set. The mean model output and the corresponding standard deviation for each true number of objects within the stimulus are illustrated in Figure 9. All data sets have in common that the model predictions are close to identity (black curve) and have very small standard deviations for small numbers. In this case, the mathematical model is approximated well by the discretization. Only for large numbers and for data set C does the standard deviation increase. (The possible reasons are discussed in detail in the Appendix.) 
For the noise-affected model, the remaining free parameters to control the system behavior are the standard deviations σin and σlin of the additive noise in the system. In order to make reasonable assumptions on the noise parameters, we analyzed the data sets with respect to their signal-to-noise ratios (SNR) at different stages of the architecture (see Table A1). The technical details can be found in the Appendix. The SNR values at the input stage plotted against the noise parameter σin can be found in Figure 10a. Analogously, the SNR curves for the derivatives of the luminance function with respect to σlin are illustrated in Figure 10b. For further consideration with respect to the information encoded by a single unit, the mean value of the curves was computed. Note that the mean curve and the curve for luminance l cover each other in Figure 10a. The same holds true for the pairs lx/ly and lxx/lyy in Figure 10b. Figure 10c shows the rate required to encode the signals at the different stages with the corresponding SNR without loss of signal quality. Details about the relation between SNR and the rate can be found in the Appendix. This rate can now be related to findings in the literature regarding single neurons and their encoding rate of transmitted signals. The transmitted information is reported with a range from 5 to 30 bits per second for a single neuron (Reich, Mechler, Purpura, & Victor, 2000). Assuming a mean firing-rate code and a time window of approximately 200 ms we get a rate interval of 1–6 bits, which can be compared with the rates in Figure 10c (gray window). At the linear filter we thus chose the standard deviation which corresponds to half the maximum of the reported rate interval—i.e., σlin = 2. At the input stage we chose the worst case, σin = 0.4. These two cases are highlighted with red circles in Figure 10. These parameters were used to simulate the model behavior in both numerosity judgment tasks. Note that the mean peak SNR of the first layer is approximately 40 dB higher than the respective SNR (compare Table A1). A higher rate of approximately 6–7 bits is required to guarantee the same signal quality for the whole signal amplitude. Multiple units encoding the same quantity could be used to overcome this problem. 
Figure 9
 
Mean model output with standard deviation plotted against the true number in the stimuli for the zero-noise case (σin, σlin) = (0, 0) for different data sets: (a) TR, (b) R, (c) RC, and (d) C.
Figure 9
 
Mean model output with standard deviation plotted against the true number in the stimuli for the zero-noise case (σin, σlin) = (0, 0) for different data sets: (a) TR, (b) R, (c) RC, and (d) C.
Figure 10
 
SNR curves with respect to the standard deviation of the noise for data set R for (a) the input stage and (b) the linear filter stage. The plotted information of (a) and (b) can be found in Table A1. (c) The required mean bit rates to encode the signals of each stage. The noise parameters chosen for the empirical investigation of numerosity-judgment tasks are highlighted with red circles in (a–c).
Figure 10
 
SNR curves with respect to the standard deviation of the noise for data set R for (a) the input stage and (b) the linear filter stage. The plotted information of (a) and (b) can be found in Table A1. (c) The required mean bit rates to encode the signals of each stage. The noise parameters chosen for the empirical investigation of numerosity-judgment tasks are highlighted with red circles in (a–c).
The results of the rectangle data set (R) for the specified noise-parameter configuration are shown in Figure 11, and the mean model outputs for the noise case can be found in Figure A9. We find that in both tasks, the performance of the artificial estimators shows very similar behavior to a human estimator (Piazza et al., 2004). On the linear axis shown in the first row of Figure 11, an increase of the variance with increasing tuned numerosity can be seen in the same–different task and a decrease in steepness with increase of tuned numerosity can be observed in the larger–smaller task. If we consider the same data on a logarithmic scale of the true numerosity, the effects are not observable anymore, as can be seen in the second row. The logarithmic representation with constant uncertainty is one possible representation (Feigenson et al., 2004) for which neural evidence has been found (Nieder & Miller, 2003). The third row shows the same data plotted on a logarithmic scale of the ratio between the true numerosity and the tuned numerosity of the estimator. For all tuned numerosities, the curves are nearly identical, which implies a behavior relying on the Weber–Fechner law. All data points on the logarithmic scale of the ratio were used to fit one continuous function (black curve). The tuning curves were used to obtain the internal Weber fraction w, an index to describe the discriminability between two numbers. In both tasks we find internal Weber fractions in the dimension of human behavior: 0.167 in the same–different task and 0.169 in the larger–smaller task on the rectangle data set (R). In comparison, Piazza et al. (2004) reported corresponding Weber fractions of 0.17 and 0.174. 
Figure 11
 
Computational results from same–different and larger–smaller tasks where the estimators were trained on data set TR and the evaluation was done on data set R. The noise parameters were σin = 0.4 and σlin = 2, and the corresponding estimator parameters can be found in Table A4. In the left column, the graphs show the relative frequencies of the estimators' response to “different,” and in the right column, the relative frequencies of the response “larger.” In both cases, the relative frequencies are plotted against functions of the true numerosities in the stimulus. The resulting data points from the evaluation process are shown as squares, rhombuses, and triangles. The continuous graphs are the fitted log-normal distributions as described in the Appendix. The graphs are skewed on a linear scale (a, d) and become symmetric on a logarithmic scale (b, e). Plotting the frequencies against the ratio with the reference numerosity Ntuned on a logarithmic scale shows that the graphs for all reference numerosities are approximately covering each other (c, f). Using data points of all reference numerosities in the logarithm of the ratio scale to fit the continuous functions yields the overall numerosity-discrimination ability (black curve in c, f) regarding the previously illustrated reference numerosities.
Figure 11
 
Computational results from same–different and larger–smaller tasks where the estimators were trained on data set TR and the evaluation was done on data set R. The noise parameters were σin = 0.4 and σlin = 2, and the corresponding estimator parameters can be found in Table A4. In the left column, the graphs show the relative frequencies of the estimators' response to “different,” and in the right column, the relative frequencies of the response “larger.” In both cases, the relative frequencies are plotted against functions of the true numerosities in the stimulus. The resulting data points from the evaluation process are shown as squares, rhombuses, and triangles. The continuous graphs are the fitted log-normal distributions as described in the Appendix. The graphs are skewed on a linear scale (a, d) and become symmetric on a logarithmic scale (b, e). Plotting the frequencies against the ratio with the reference numerosity Ntuned on a logarithmic scale shows that the graphs for all reference numerosities are approximately covering each other (c, f). Using data points of all reference numerosities in the logarithm of the ratio scale to fit the continuous functions yields the overall numerosity-discrimination ability (black curve in c, f) regarding the previously illustrated reference numerosities.
In summary, the computational results agree very well with behavioral results (Halberda et al., 2008; Piazza et al., 2004; Piazza et al., 2010) and can compete with recent computational results of a neural-network model (Stoianov & Zorzi, 2012; w = 0.15 in larger–smaller task). Furthermore, the proposed model is able to reproduce the desired behavioral results on data sets covering a broader range of object shapes, as can be seen in Table 1. Only random shapes for a small image size appear somewhat problematic (for an explanation, see the Appendix). All other shapes, from rectangular to circular objects, different kinds of concave objects, and even larger randomly shaped objects, result in quite similar Weber fractions. 
Table 1
 
Parameters of the continuous data-fitting functions for concave data sets. Notes: In all cases the noise levels were σin = 0.4 and σlin = 2. The parameters of the optimal estimators were obtained from data set TR.
Table 1
 
Parameters of the continuous data-fitting functions for concave data sets. Notes: In all cases the noise levels were σin = 0.4 and σlin = 2. The parameters of the optimal estimators were obtained from data set TR.
Discussion
In this article, we proposed a general mathematical approach to the computation of numerosity. We suggested formalizing the problem and its solution by considering numerosity as a mathematical invariant. We then developed a model which takes into account the restrictions imposed by the structure of the visual system and known neurophysiology. The proposed model was applied to visual stimuli and finally validated in computational experiments. 
We first considered the problem on an abstract level, with the goal of providing a formal mathematical solution without having to consider the specific constraints imposed by the human visual system and the neural computations of the visual cortex. For this, we formalized it as the problem of an assignment of the property number to a configuration c of objects in 3-D space ℜ3 (Figure 1). We then argued that number specified in this sense has to be regarded as a mathematical invariant (Figure 2). This simple fact is in our view a central contribution of this article: The understanding and modeling of numerosity should be considered as an investigation of the specific invariance properties that have to be provided by the perceptual processes and neural computations. 
The special nature of this invariant—numerosity does not depend on the shape of the objects involved—suggested that we consider the problem in the context of topology. Since the crucial property that constitutes an object is the fact that it occupies a connected region of space, the appropriate formal invariant turned out to be the zeroth Betti number, which counts the number of connected components of a topological space. It should be noted that this invariant represents the ideal solution to the problem of numerosity, and should hence be taken into account in future investigations, both in experimental and in model research. However, if we consider its computational properties, it turns out that the Betti number is not compatible with the established properties of numerosity perception: The biological numerosity system acts quite fast (Park et al., 2015; Roitman et al., 2007) and is based on parallel computations, since there exists no hint of a dependence of processing time on the number of elements in the set (Mandler & Shebo, 1982). Connected-component-labeling algorithms from digital topology are able to estimate the number of components as a by-product of the labeling process, but these algorithms are inherently sequential and require multiple passes through the image (e.g., He, Chao, & Suzuki, 2007; Suzuki, Horiba, & Sugie, 2003). 
This led us to consider an alternative approach that is suited for parallel implementation. It turned out that such a solution can be achieved if we are willing to accept slightly more idealized conditions regarding the definition of what constitutes an object. If we consider objects as simply connected entities in topological terms, a solution can be found which is fully parallel and also has a high degree of neurophysiological plausibility, since the required invariant then becomes equivalent to the Euler characteristic χ(C). This in turn is computable via the famous Gauss–Bonnet theorem by measuring in a local, point-wise fashion the surface curvature of the objects in the set. By this we have specified numerosity as an invariant that can be computed in parallel by making use of local mathematical operations (derivatives). 
Up to here, our analysis was abstracted from the specific conditions of the visual system insofar as the objects and their surfaces are assumed to be defined in 3-D space. If we assume that the visual system is able to create some sort of 3-D representation of the environment (even a 2½-D representation may suffice; Marr, 1982), the problem of numerosity can indeed be solved by direct application of Gauss–Bonnet to this representation. However, we do not want to be logically dependent on this specific assumption, since there also exist theories which assume that the handling of 3-D information in the early stages of the visual system (e.g., for 3-D object recognition) is based on a representation which consists of multiple views of which each is only a 2-D image (Wallis & Bülthoff, 1999). 
We hence considered alternatives for how Gauss–Bonnet can be applied directly to the processing of 2-D images (optical projections of the 3-D configurations to 2-D retinal images). It turned out that this is basically possible by interpreting the 2-D luminance function as a surface (Figure 4). It can be shown that this leads to three further model variants that represent different strategies for dealing with deviations from the ideal conditions of the theory (Kluth & Zetzsche, 2014). In this article we use the specific variant of the model which takes the deviations along the object borders explicitly into account by a second term (compare Equation 4). It can be assumed that this yields the most stable model version, because it combines the robust large-area integration of the surface curvature with a correction term. 
Basing our analysis largely on formal mathematical considerations raises the immediate question of empirical support. Regarding neurophysiological plausibility, we have shown that the model can be realized by the neural hardware believed to be available in the visual cortex. The local core operations are first- and second-order derivatives which correspond to oriented mechanisms with odd and even symmetry. It is well known that Gaussian derivatives are well suited for the description of neurons in the primary visual cortex (e.g., Koenderink & van Doorn, 1990; Lindeberg, 2013; Marr & Ullman, 1981; Martens, 1990; Young, 1987; Young & Lesperance, 2001). The computation of the local curvature terms requires an AND-like multiplication, which can be provided by a number of neurophysiologically plausible mechanisms (Adelson & Bergen, 1985; Koch & Segev, 2000; Mel, 1993; Resnikoff & Wells, 2015; Zetzsche & Barth, 1990a; Zetzsche & Nuding, 2005). Furthermore, it has been shown that certain forms of end stopping, of the hypercomplex property, and of extraclassical receptive-field properties show a close relation to the computation of curvature (Zetzsche & Barth, 1990a, 1990b; Zetzsche & Nuding, 2005; Zetzsche & Roehrbein, 2001). The computation of the curvature terms further requires a ratio operation. This is very similar to the mechanism of cortical gain control which is available at early stages of the visual cortex and is regarded as a “canonical” neural computation of the cortex, which exists in various versions (Carandini & Heeger, 2012). 
Essential questions regarding the proposed model are whether it leads to clear testable predictions and how it compares to other existing models. We start with the comparison to other models, which can be considered image-processing models. This criterion can range from models which process only binary images to full image-processing models, which accept an arbitrary gray-level image as input and compute the corresponding numerosity. As mentioned before, to our knowledge the first numerosity model of this type was suggested by Zetzsche and Barth (1990b). This model can be regarded as a simplified version of the present one which represents shapes as polygons with ±90° corners and straight segments aligned to the Cartesian grid. Aside from this it is based on the same invariance principle and will lead to similar predictions, such that we will not further consider it as a separate model in the following discussion. The classic model by Dehaene and Changeux (1993) can be considered as a restricted version of a full image-processing model, since it is only suited for 1-D images. Of greatest interest for the comparison are two more recent models which match the criteria. These are the “contrast energy model” (Dakin et al., 2011; Morgan, Raphael, Tibber, & Dakin, 2014) and the neural network of Stoianov and Zorzi (2012). The latter one exists in two versions, as a full neural network and as an abstracted mathematical model (Cappelletti et al., 2014; Stoianov & Zorzi, 2012), which will be considered the “network model” in the following comparison. 
The model of Dehaene and Changeux (1993), henceforth designated the “normalization model,” differs from the two other models and from our one not only in being only one-dimensional but also with respect to the invariance principle being used. In the normalization model the invariance is achieved not in an implicit fashion by the distributed computation of local features but with an explicit normalization stage in which input objects are mapped to a standard representation that no longer varies in dependence on the shape properties of the input objects. This is achieved by regarding each object as a blob which is represented by a dedicated “blob detector.” This principle can generate a certain size invariance but cannot cope with the different shape variations which become only fully apparent in the 2-D case, for example in the form of elongated line-like elements. A single blob-matching system will not be able to bring into one standardized form such different element types as lines, discs, and nonconvex shapes—which can additionally be arranged in quite different spatial patterns—without interference between the invariance and the numbering properties. The existence of a blob-matching stage as opposed to the spatially distributed computation of local features in the contrast energy model, the network model, and our model is a structural property which should also be testable on the neural level. While the model of Dehaene and Changeux (1993) predicts the existence of local units which represent a single object in an invariant fashion, the invariance in the other models is an emergent property which will only become apparent in the final spatial summation stage, where the contributions from several objects are combined. 
A comparison of these models can be based on two methods: consideration of the underlying invariance principle and direct comparison of the detailed computations. In the following we will make use of the first approach and consider how far the contrast energy model and the network model make use of alternative invariance mechanisms to the one proposed here. The second approach is described on a detailed level in the Appendix
Two basic shape properties seem somehow to be involved in the computations of the two models: the boundary of objects and their area. In the network model, the cumulative area of all elements is seen to play a crucial role as a covarying factor (Stoianov & Zorzi, 2012); and for the contrast energy model, the contour length of the elements is explicitly mentioned as one crucial factor in its computations (Morgan et al., 2014). On a closer look, the contrast energy model exists in two variants. In the first variant, the invariance is basically attributed to the high-frequency filtering stage, and the low-frequency filters are only considered for handling a moderate bias from the size of the stimulus configuration (Dakin et al., 2011). In this interpretation, the contours of the elements are suggested as a crucial factor in the computation, since the aggregated contour length is directly proportional to the number of objects (Morgan et al., 2014). However, without an additional compensation mechanism this solution would predict a strong influence of element size, since the size would directly influence the aggregated contour length. Deviating from the interpretation of Dakin et al. (2011) and Morgan et al. (2014), we will thus consider a second interpretation of the model, in which the low-frequency filters could contribute to size invariance by computing a cumulative area estimate. This is of special interest because the invariance properties of the network model are also suggested to be based on a trade-off term which computes cumulative area (Stoianov & Zorzi, 2012). 
Is it possible to compute the desired invariant by making use of contour length and area? It is a well-known fact that the ratio  with A being the area of an object and L its contour length, is a geometric invariant with respect to similarity transformation (which includes size changes as well as position and orientation changes). We can thus use this invariance property of the ratio to compute the number of objects of a configuration based on the aggregated area ΣAi and the aggregated contour length ΣLi as  where all objects have the same arbitrary area Ai = A and share the same characteristic q. This means that we can scale the elements of a configuration by an arbitrary common factor, and we can arbitrarily rotate and shift the individual elements without causing any influence on the number estimate (compare Figure 2).  
However, although the ratio q is a geometric invariant in the sense described earlier, it is not the full topological invariant desired for the computation of number, since it still has a systematic dependency on the shape of the elements. It has long been known that the normalized variant of q, the so-called isoperimetric quotient Q = 4πA/L2 is a measure of the compactness of a shape (Kesavan, 2006). This measure attains its maximum of 1 in the case of circles but can easily get much smaller if the object becomes more ragged. For specific patterns, like the well-known Koch snowflake, Q can even approach 0. But also for commonplace patterns q is often much smaller than 1. For U-like shapes, for example, Q is around 1/3, which would imply that three U-shaped elements should appear perceptually as numerous as nine disks. 
It is not directly evident that the two models are exactly aimed at the computation of this ratio invariance, since only one of its parts is emphasized in each description of the models. But in the contrast energy model the low-frequency filtering can be seen to bear a certain resemblance to area-related computations. And in the network model, a contour-related feature can be seen to be implicitly computed, as explained in detail in the common framework analysis in the Appendix. However, we can draw the following general conclusion regarding all models that try to somehow make use of a trade-off between contour length and area to compute the number of objects: If the trade-off computations in these models deviate from the ratio invariant, they will lose the size invariance. If they manage to approach the ratio invariant, however, they will be subject to a substantial dependency on the shape of the elements. If we want to allow for arbitrary object shapes, making use of the ratio invariant thus seems not to be a viable solution. 
This also becomes apparent if we use the second approach for model comparison, consideration of the local features which are extracted by the different models. A detailed analysis can be found in the Appendix. It provides information about where the two other models differ from our approach, and this can also help in understanding how the different invariance properties are generated. The simplest case for understanding these effects is provided by the processing of binary images. For binary objects the interior region is completely flat, and the Gauss–Bonnet theorem tells us that no contribution should come from these interior points. Hence the only nonvanishing contributions should come from the object boundaries (compare the Appendix). In this sense it is a reasonable strategy to use only the contour region for the computation, as in the contrast energy model or, in an implicit fashion, in the network model (see the Appendix). However, it is crucial that not all contour points contribute in the same fashion to the final spatially integrated numerosity variable. Rather, the contribution should depend on the curvature of the boundary. In particular, there should be a zero contribution from straight contour parts. That the contribution from straight or low-curvature contours is not appropriately reduced by the contrast energy model and the network model is, in our view, the essential reason that systematic deviations from the ideal invariance properties have to be expected for these models. 
In conclusion, models which are based on some trade-off between area and contour will show a systematic dependency on shape, element size, or both, and the same is true for models which rely only on contour features but provide no appropriate curvature-dependent weighting. 
It should be noted, however, that this does not necessarily imply that these models are not suited as models of human numerosity perception. In particular, if they do not realize the full size invariance but instead use some compromise which combines a medium size dependency with a medium shape dependency, it remains to be tested whether perceived numerosity does not exhibit the same form of deviation from perfect invariance. It is well known that the human numerosity system shows a systematic dependency on nonnumerical parameters like the size of the elements (e.g., DeWind, Adams, Platt, & Brannon, 2015; Hurewitz, Gelman, & Schnitzer, 2006; Ross, 2003). However, it has been argued that many studies focus only on finding a statistically significant effect instead of quantifying the exact quantitative relation between the parameter and the numerosity bias, and that they rather interpret the influence of the parameter as an estimation error (DeWind et al., 2015). If the quantitative influence of nonnumerical cues is instead explicitly modeled, the results suggest that the quantitative effect of the cues is relatively small for most individuals (DeWind et al., 2015). 
Nevertheless, the considerations from earlier suggest that systematic quantitative measurements of the influence of nonnumerical parameters on perceived numerosity are required in order to draw further conclusions about the different models. We would assume that the explicit consideration of invariance mechanisms and invariance properties, as exemplified in the present analysis, could be one valuable strategy for such a systematic analysis. It is clear, however, that the prediction from our model on the outcome of such investigations would be that the quantitative deviations from invariance should always be relatively small. 
Empirical tests of the invariance properties are in our view also essential with respect to the debate about a “true sense of number” as opposed to a texture-density-based mechanism (Dakin et al., 2011). It is clear that we should expect to find strong interrelations between density, numerosity, and cumulative area, since formally density and number are completely equivalent because they can be transformed into each other via the area. However, it would be justified in our view to regard numerosity perception as just a by-product of texture processing if it is derived from some texture-related processing, like band-pass filtering, and if the quantitative deviations from invariance are predicted by the properties of this mechanism. If, on the other hand, the invariance properties should turn out to be more compatible with our sort of approach, then, irrespective of whether the crucial variables in the system would covary directly or inversely with number, we would consider it adequate to speak of a sense of number. 
In this context it is also of interest to consider a major argument for a direct visual sense for number, which is given by the fact that number seems to be a primary visual property like color, orientation, or motion to which the visual system can be adapted by prolonged viewing (Ross & Burr, 2010). If we consider the invariance mechanism of our model, the only stage where adaptation would selectively influence only the number property would be the stage of the final summation units. Adaptation at this stage would cause a systematic influence even if we later test with entirely different elements and spatial configurations as the ones being used during adaptation. However, there should also arise systematic influences if we adapt the basic curvature features (compare, e.g., Bell, Gheorghiu, & Kingdom, 2009; Blakemore & Over, 1974; Hancock & Peirce, 2008). If we adapt to patterns with high local curvatures, for example, the contribution of the curvature mechanisms to the integral should be reduced such that the resulting numerosity perceived in later tests should also be reduced. 
The very nature of our approach leads to further testable predictions. The most prominent prediction, which applies to the basic model and all its variants, is based on its intimate relation to topology and in particular to the topological concept of connectivity: On the one hand, many possible nonnumeric changes of a configuration of objects are predicted to have only a small influence, even if they are quite dramatic with respect to basic signal-level properties. These are, for example, drastic changes of the sizes of the objects, or substantial alterations of their shapes (e.g., from a thin, elongated shape to a compact, round one). On the other hand, changes of the topology, and in particular changes of connectivity, should have a strong influence, even if the corresponding signal-level changes are very small. An example of this would be the connection of two big blobs by a thin line, where the line width can be assumed to be represented on a substantially smaller spatial scale than the blob size (Figure 8). Our model would predict a clear decrease of perceived numerosity in spite of the small signal-level change. In a model using area as an essential variable, the influence should be relatively weak, whereas a mechanism relying on aggregated contour length would predict an increase of perceived numerosity. The predictions of our model are supported by findings which show that visual perception is sensitive to topological quantities (Chen, 2005) and in particular that a change of topological connectivity affects visual numerosity estimation (Franconeri et al., 2009; He, Zhang, et al., 2009). 
As a last point for empirical tests, it should be remembered that the restriction to simply connected objects has only been caused by plausibility arguments, since for the computation of the invariant zeroth Betti number no parallel algorithm is known. It would thus be of special interest to perform experiments with human subjects in which exactly this property is manipulated (e.g., by determining how the perceived numerosity is influenced by making “holes” into the objects). In this context we again mention that there exists already evidence that human perception in general is sensitive to topological quantities, and this includes a significant influence of holes on perception (Chen, 2005), 
As our approach is derived from formal mathematical considerations, it seems to generate an undesired prediction: There seems to be no obvious role for errors, and in particular not for Weber-like behavior. However, it thus can be regarded as important supporting evidence that only on the basis of quite natural assumptions about noise sources, and without any explicit structural support (e.g., by logarithmic transfer functions or gain-control mechanisms), does the model exhibit such a Weber-like behavior. This has been confirmed by tests with an extended data set that will be made publicly available. The noisy model combined with a decision-making system was able to closely reproduce the number-discrimination abilities of human subjects: In the literature, Weber fractions of 0.174 (Piazza et al., 2004), 0.108 (Halberda & Feigenson, 2008), and 0.15 (Piazza et al., 2010) have been reported for adults. The presented parametrization of the noise levels results in Weber fractions between 0.150 and 0.197 for the six basic data sets. 
In conclusion, we have provided a formal analysis of the problem of numerosity. The essential and most basic result is that numerosity should be regarded as a mathematical invariant. Based on concepts from topology, we have also derived a basic model structure and several specific variants of this model. Its key property is given by the Gauss–Bonnet formula, which provides the desired invariance properties by the parallel integration of local curvature measures. These in turn are based on neurophysiologically highly plausible operations: directional derivatives (oriented receptive fields), nonlinear AND-like combinations related to extraclassical receptive-field properties, and divisive operations similar to cortical gain-control (normalization) mechanisms. The properties that turned up in our analysis are so basic from a mathematical point of view that it seems difficult to believe that there could be any mathematically reasonable model for the sense of number which is based on parallel computations but does not somehow relate to the invariance principles described here. It may thus be hoped that the conceptual framework suggested here can serve as a fruitful basis for future research into the basic cognitive capacity of numerosity perception. 
Acknowledgments
The original motivation for this research came in 1989, when Erhardt Barth, at that time a diploma student under the supervision of CZ, returned from a talk by Chen Lin on topological perception. This work was supported by Deutsche Forschungsgemeinschaft (SFB/TR8 Spatial Cognition, project A5-[ActionSpace]) and the German Federal Ministry for Economic Affairs and Energy (DLR project “CAUSE,” funding no. 50 NA 1505). 
Commercial relationships: none. 
Corresponding author: Christoph Zetzsche. 
Email: zetzsche@informatik.uni-bremen.de. 
Address: Cognitive Neuroinformatics, University of Bremen, Bremen, Germany. 
References
Abbott L., Varela J., Sen K., Nelson S. (1997, Jan 10). Synaptic depression and cortical gain control. Science, 275, 221–224.
Adelson E. H., Bergen J. R. (1985). Spatiotemporal energy models for the perception of motion. Journal of the Optical Society of America A, 2 (2), 284–299.
Allik J., Tuulmets T., Vos P. G. (1991). Size invariance in visual number discrimination. Psychological Research, 53 (4), 290–295, doi:10.1007/BF00920482.
Anobile G., Cicchini G. M., Burr D. C. (2014). Separate mechanisms for perception of numerosity and density. Psychological Science, 25 (1), 265–270, doi:10.1177/0956797613501520.
Barlow H. (2013). Linking minds and brains. Visual Neuroscience, 30, 207–217, doi:10.1017/S0952523813000291.
Barth E., Ferraro M., Zetzsche C. (2001). Global topological properties of images derived from local curvature features. In Arcelli C. Cordella L. P. Sanniti di Baja G. (Eds.) Visual form (pp. 285–294). Berlin Heidelberg, Germany: Springer.
Bell, J., Gheorghiu E., Kingdom F. (2009). Orientation tuning of curvature adaptation reveals both curvature-polarity-selective and non-selective mechanisms. Journal of Vision, 9 (12): 3, 1–11, doi:10.1167/9.12.3. [PubMed] [Article]
Blakemore C., Over R. (1974). Curvature detectors in human vision. Perception, 3 (1), 3–7.
Bonn C. D., Cantlon J. F. (2012). The origins and structure of quantitative concepts. Cognitive Neuropsychology, 29 (1–2), 149–173, doi:10.1080/02643294.2012.707122.
Brannon E. M. (2006). The representation of numerical magnitude. Current Opinion in Neurobiology, 16 (2), 222–229, doi:10.1016/j.conb.2006.03.002.
Buzsáki G., Mizuseki K. (2014). The log-dynamic brain: How skewed distributions affect network operations. Nature Reviews Neuroscience, 15 (4), 264–278, doi:10.1038/nrn3687.
Cappelletti M., Didino D., Stoianov I., Zorzi M. (2014). Number skills are maintained in healthy ageing. Cognitive Psychology, 69, 25–45.
Carandini M., Heeger D. J. (2012). Normalization as a canonical neural computation. Nature Reviews Neuroscience, 13 (1), 51–62, doi:10.1038/nrn3136.
Chang C.-I. (2010). Multiparameter receiver operating characteristic analysis for signal detection and classification. IEEE Sensors Journal, 10 (3), 423–442, doi:10.1109/JSEN.2009.2038120.
Chen L. (2005). The topological approach to perceptual organization. Visual Cognition, 12 (4), 553–637, doi:10.1080/13506280444000256.
Dakin S. C., Tibber M. S., Greenwood J. A., Kingdom F. A. A., Morgan M. J. (2011). A common visual metric for approximate number and density. Proceedings of the National Academy of Sciences, USA, 108 (49), 19552–19557, doi:10.1073/pnas.1113195108.
Daugman J. G. (1985). Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. Journal of the Optical Society of America A, 2 (7), 1160–1169, doi:10.1364/J0SAA.2.001160.
Dayan P., Abbott L. F. (2001). Theoretical neuroscience: Computational and mathematical modeling of neural systems. Cambridge, MA: MIT Press.
Dehaene S., Changeux J. P. (1993). Development of elementary numerical abilities: A neuronal model. Journal of Cognitive Neuroscience, 5 (4), 390–407, doi:10.1162/jocn.1993.5.4.390.
DeWind N. K., Adams G. K., Platt M. L., Brannon E. M. (2015). Modeling the approximate number system to quantify the contribution of visual stimulus features. Cognition, 142, 247–265, doi:10.1016/j.cognition.2015.05.016.
Dobbins A., Zucker S. W., Cynader M. S. (1987). Endstopped neurons in the visual cortex as a substrate for calculating curvature. Nature, 329 (6138), 438–441.
Durgin F. H. (2008). Texture density adaptation and visual number revisited. Current Biology, 180 (18), R855–R856, doi:10.1016/j.cub.2008.07.053.
Fawcett T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27 (8), 861–874, doi:10.1016/j.patrec.2005.10.010.
Feigenson L., Dehaene S., Spelke E. (2004). Core systems of number. Trends in Cognitive Sciences, 8 (7), 307–314, doi:10.1016/j.tics.2004.05.002.
Franconeri S., Bemis D., Alvarez G. (2009). Number estimation relies on a set of segmented objects. Cognition, 113 (1), 1–13.
Gallistel C. R., Gelman R. (1992). Preverbal and verbal counting and computation. Cognition, 44 (1–2), 43–74, doi:10.1016/0010- 0277(92)90050-R.
Gallistel C. R., Gelman R. (2000). Non-verbal numerical cognition: From reals to integers. Trends in Cognitive Sciences, 4 (2), 59–65, doi:10.1016/S1364-6613(99)01424-2.
Geisler W. S., Albrecht D. G. (1992). Cortical neurons: Isolation of contrast gain control. Vision Research, 32 (8), 1409–1410.
Gray R. M., Neuhoff D. L. (1998). Quantization. IEEE Transactions on Information Theory, 44 (6), 2325–2383.
Gross H. J., Pahl M., Si A., Zhu H., Tautz J., Zhang S. (2009). Number-based visual generalisation in the honeybee. PloS One, 4 (1), e4263, doi:10.1371/journal.pone.0004263.
Grossberg S. (1988). Nonlinear neural networks: Principles, mechanisms, and architectures. Neural Networks, 1 (1), 17–61, doi:10.1016/0893-6080(88)90021-4.
Halberda J., Feigenson L. (2008). Developmental change in the acuity of the “number sense”: The approximate number system in 3-, 4-, 5-, and 6-year-olds and adults. Developmental Psychology, 44 (5), 1457–1465, doi:10.1037/a0012682.
Halberda J., Mazzocco M. M., Feigenson L. (2008). Individual differences in non-verbal number acuity correlate with maths achievement. Nature, 455 (7213), 665–668, doi:10.1038/nature07246.
Hancock S., Peirce J. W. (2008). Selective mechanisms for simple contours revealed by compound adaptation. Journal of Vision, 8 (7): 11, 1–10, doi:10.1167/8.7.11. [PubMed] [Article]
Hashemi-Nezhad M., Lyon D. C. (2012). Orientation tuning of the suppressive extraclassical surround depends on intrinsic organization of V1. Cerebral Cortex, 22 (2), 308–326.
He L., Chao Y., Suzuki K. (2007). A linear-time two-scan labeling algorithm. In IEEE international conference on image processing, Vol. 5 (pp. V-241–V-244). IEEE.
He L., Chao Y., Suzuki K., Wu K. (2009). Fast connected-component labeling. Pattern Recognition, 42 (9), 1977–1987, doi:10.1016/j.patcog.2008.10.013.
He L., Zhang J., Zhou T., Chen L. (2009). Connectedness affects dot numerosity judgment: Implications for configural processing. Psychonomic Bulletin & Review, 16 (3), 509–517, doi:10.3758/PBR.16.3.509.
Heeger D. J. (1992). Normalization of cell responses in cat striate cortex. Visual Neuroscience, 9 (2), 181–197, doi:10.1017/S0952523800009640.
Hegde J., Felleman D. (2007). Reappraising the functional implications of the primate visual anatomical hierarchy. The Neuroscientist, 13 (5), 416–421, doi:10.1177/1073858407305201.
Hornik K., Stinchcombe M., White H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2 (5), 359–366, doi:10.1016/0893-6080(89)90020-8.
Hurewitz F., Gelman R., Schnitzer B. (2006). Sometimes area counts more than number. Proceedings of the National Academy of Sciences, USA, 103 (51), 19599–19604, doi:10.1073/pnas.0609485103.
Ito M., Komatsu H. (2004). Representation of angles embedded within contour stimuli in area V2 of macaque monkeys. The Journal of Neuroscience, 24 (13), 3313–3324.
Kaski S., Kohonen T. (1994). Winner-take-all networks for physiological models of competitive learning. Neural Networks, 7 (6–7), 973–984, doi:10.1016/S0893-6080(05)80154-6.
Kaufman E. L., Lord M. W., Reese T. W., Volkmann J. (1949). The discrimination of visual number. The American Journal of Psychology, 62 (4), 498–525.
Kesavan S. (2006). Symmetrization & applications (Vol. 3). Singapore: World Scientific.
Kluth T., Zetzsche C. (2014). Spatial numerosity: A computational model based on a topological invariant. In Freksa C. Nebel B. Hegarty M. Barkowsky T. (Eds.) Spatial cognition IX (Vol. 8684, pp. 237–252). Heidelberg, Germany: Springer International Publishing.
Kluth, T., Zetzsche C. (2015). A topological view on numerosity. Manuscript in preparation.
Koch C., Segev I. (2000). The role of single neurons in information processing. Nature Neuroscience, 3, 1171–1177.
Koenderink J. J., van Doorn A. J. (1990). Receptive field families. Biological Cybernetics, 63 (4), 291–297, doi:10.1007/BF00203452.
Koenderink J. J., van Doorn A. J. (2003). Shape and shading. In Chalupa L. M. Werner J. S. (Eds.) The visual neurosciences (pp. 1090–1105). Cambridge, MA: MIT Press.
Kuffler, S. W. (1953). Discharge patterns and functional organization of mammalian retina. Journal of Neurophysiology, 16 (1), 37–68.
Lampl I., Ferster D., Poggio T., Riesenhuber M. (2004). Intracellular measurements of spatial integration and the max operation in complex cells of the cat primary visual cortex. Journal of Neurophysiology, 92 (5), 2704–2713.
Lindeberg T. (2013). A computational theory of visual receptive fields. Biological Cybernetics, 107 (6), 589–635, doi:10.1007/s00422-013–0569-z.
Mandler G., Shebo B. J. (1982). Subitizing: An analysis of its component processes. Journal of Experimental Psychology: General, 111 (1), 1–22, doi:10.1037/0096-3445.111.1.1.
Marr D. (1982). Vision: A computational investigation into the human representation and processing of visual information. San Francisco: W.H. Freeman and Company.
Marr D., Hildreth E. (1980). Theory of edge detection. Proceedings of the Royal Society of London B: Biological Sciences, 207 (1167), 187–217.
Marr D., Ullman S. (1981). Directional selectivity and its use in early visual processing. Proceedings of the Royal Society of London B: Biological Sciences, 211 (1183), 151–180.
Martens J.-B. (1990). The Hermite transform-theory. IEEE Transactions on Acoustics, Speech and Signal Processing, 38 (9), 1595–1606, doi:10.1109/29.60086.
Mead C., Ismail M. (2012). Analog VLSI implementation of neural systems. New York: Springer: Springer Science & Business Media.
Meck W. H., Church R. M. (1983). A mode control model of counting and timing processes. Journal of Experimental Psychology: Animal Behavior Processes, 9 (3), 320–334, doi:10.1037/0097-7403.9.3.320.
Mel B. W. (1993). Synaptic integration in an excitable dendritic tree. Journal of Neurophysiology, 70 (3), 1086–1101.
Morgan M. J., Raphael S., Tibber M. S., Dakin S. C. (2014). A texture-processing model of the visual sense of number. Proceedings of the Royal Society B: Biological Sciences, 281 (1790), 20141137, doi:10.1098/rspb.2014.1137.
Nieder A., Freedman D. J., Miller E. K. (2002, Sept 6). Representation of the quantity of visual items in the primate prefrontal cortex. Science, 297, 1708–1711, doi:10.1126/science.1072493.
Nieder A., Miller E. K. (2003). Coding of cognitive magnitude: Compressed scaling of numerical information in the primate prefrontal cortex. Neuron, 37 (1), 149–157.
Palmer S., Rock I. (1994). Rethinking perceptual organization: The role of uniform connectedness. Psychonomic Bulletin & Review, 1 (1), 29–55, doi:10.3758/BF03200760.
Park J., DeWind N. K., Woldorff M. G., Brannon E. M. (2015). Rapid and direct encoding of numerosity in the visual stream. Cerebral Cortex, 26 (2), 748–763, doi:10.1093/cercor/bhv017.
Phong B. T. (1975). Illumination for computer generated pictures. Communications of the ACM, 18 (6), 311–317, doi:10.1145/360825.360839.
Piazza M., Facoetti A., Trussardi A. N., Berteletti I., Conte S., Lucangeli D., Zorzi M. (2010). Developmental trajectory of number acuity reveals a severe impairment in developmental dyscalculia. Cognition, 116 (1), 33–41, doi:10.1016/j.cognition.2010.03.012.
Piazza M., Izard V., Pinel P., Le Bihan D., Dehaene S. (2004). Tuning curves for approximate numerosity in the human intraparietal sulcus. Neuron, 44 (3), 547–555, doi:10.1016/j.neuron.2004.10.014.
Piazza M., Mechelli A., Butterworth B., Price C. J. (2002). Are subitizing and counting implemented as separate or functionally overlapping processes? NeuroImage, 15 (2), 435–446, doi:10.1006/nimg.2001.0980.
Raphael S., Dillenburger B., Morgan M. (2013). Computation of relative numerosity of circular dot textures. Journal of Vision, 13 (2): 17, 1–11, doi:10.1167/13.2.17. [PubMed] [Article]
Reich D. S., Mechler F., Purpura K. P., Victor J. D. (2000). Interspike intervals, receptive fields, and information encoding in primary visual cortex. The Journal of Neuroscience, 20 (5), 1964–1974.
Resnikoff H. L., Wells R. O.,Jr. (2015). Mathematics in civilization. Mineola, NY: Courier Dover Publications.
Reynolds J. H., Chelazzi L. (2004). Attentional modulation of visual processing. Annual Review of Neuroscience, 27, 611–647, doi:10.1146/annurev.neuro.26.041002.131039.
Roitman J. D., Brannon E. M., Platt M. L. (2007). Monotonic coding of numerosity in macaque lateral intraparietal area. PLoS Biology, 5 (8), e208, doi:10.1371/journal.pbio.0050208.
Ross J. (2003). Visual discrimination of number without counting. Perception, 32 (7), 867–870, doi:10.1068/p5029.
Ross J., Burr D. C. (2010). Vision senses number directly. Journal of Vision, 10 (2): 10, 1–8, doi:10.1167/10.2.10. [PubMed] [Article]
Saito H.-A., Tanaka K., Fukada Y., Oyamada H. (1988). Analysis of discontinuity in visual contours in area 19 of the cat. The Journal of Neuroscience, 8 (4), 1131–1143.
Salinas E., Thier P. (2000). Gain modulation: A major computational principle of the central nervous system. Neuron, 27 (1), 15–21, doi:10.1016/S0896-6273(00)00004-0.
Schlick C. (1993). A customizable reflectance model for everyday rendering. In Cohen M. Puech C. Sillion F. (Eds.) Fourth Eurographics Workshop on Rendering (pp. 73–83). Geneva, Switzerland: European Association for Computer Graphics.
Schwartz, O., Simoncelli E. P. (2001). Natural signal statistics and sensory gain control. Nature Neuroscience, 4 (8), 819–825.
Serre T., Wolf L., Poggio T. (2005). Object recognition with features inspired by visual cortex. In Schmid C. Soatta S. Tomasi C. (Eds.) IEEE Computer Society conference on computer vision and pattern recognition, Vol. 2 (pp. 994–1000). IEEE.
Starkey, P., Spelke E. S., Gelman R. (1990). Numerical abstraction by human infants. Cognition, 36 (2), 97–127, doi:10.1016/0010-0277(90)90001-Z.
Stoianov I., Zorzi M. (2012). Emergence of a “visual number sense” in hierarchical generative models. Nature Neuroscience, 15 (2), 194–196, doi:10.1038/nn.2996.
Strauss M. S., Curtis L. E. (1981). Infant perception of numerosity. Child Development, 52 (4), 1146–1152, doi:10.2307/1129500.
Suzuki K., Horiba I., Sugie N. (2003). Linear-time connected-component labeling based on sequential local operations. Computer Vision and Image Understanding, 89 (1), 1–23, doi:10.1016/S1077-3142(02)00030-9.
Trick L. M., Pylyshyn Z. W. (1994). Why are small and large numbers enumerated differently? A limited-capacity preattentive stage in vision. Psychological Review, 101(1), 80–102, doi:10.1037/0033–295X.101.1.80.
Umbaugh S. E. (2005). Computer imaging: Digital image analysis and processing. Boca Raton, FL: CRC Press.
Verguts T., Fias W. (2004). Representation of number in animals and humans: A neural model. Journal of Cognitive Neuroscience, 16 (9), 1493–1504, doi:10.1162/0898929042568497.
Wallis G., Bülthoff H. (1999). Learning to recognize objects. Trends in Cognitive Sciences, 3 (1), 22–31, doi:10.1016/S1364-6613(98)01261-3.
Walsh V. (2003). A theory of magnitude: Common cortical metrics of time, space and quantity. Trends in Cognitive Sciences, 7 (11), 483–488, doi:10.1016/j.tics.2003.09.002.
Whalen J., Gallistel C. R., Gelman R. (1999). Nonverbal counting in humans: The psychophysics of number representation. Psychological Science, 10 (2), 130–137, doi:10.1111/1467-9280.00120.
Xu F., Spelke E. S. (2000). Large number discrimination in 6-month-old infants. Cognition, 74 (1), B1–B11, doi:10.1016/S0010-0277(99)00066-9.
Young R. A. (1987). The Gaussian derivative model for spatial vision: I. Retinal mechanisms. Spatial Vision, 2 (4), 273–293, doi:10.1163/156856887X00222.
Young R. A., Lesperance R. M. (2001). The Gaussian derivative model for spatial-temporal vision: II. Cortical data. Spatial Vision, 14 (3), 321–389, doi:10.1163/156856801753253591.
Zahn C. T., Roskies R. Z. (1972). Fourier descriptors for plane closed curves. IEEE Transactions on Computers, 100 (3), 269–281.
Zetzsche C., Barth E. (1990a). Fundamental limits of linear filters in the visual processing of two-dimensional signals. Vision Research, 30 (7), 1111–1117.
Zetzsche C., Barth E. (1990b). Image surface predicates and the neural encoding of two-dimensional signal variations. In Rogowitz B. E. Allebach J. P. (Eds.) Human vision and electronic imaging: Models, methods, and applications, Vol. 1249 (pp. 160–177). Bellingham, WA: SPIE.
Zetzsche, C., Gadzicki K., Kluth T. (2013). Statistical invariants of spatial form: From local and to numerosity. In Kutz O. Bhatt M. Borgo S. Santos P. (Eds.) Proceedings of the Second Interdisciplinary Workshop The Shape of Things (Vol. 1007, pp. 163–172). Aachen, Germany: CEUR-WS.org.
Zetzsche, C., Nuding U. (2005). Nonlinear and higher-order approaches to the encoding of natural scenes. Network: Computation in Neural Systems, 16 (2–3), 191–221.
Zetzsche C., Roehrbein F. (2001). Nonlinear and extra-classical receptive field properties and the statistics of natural scenes. Network: Computation in Neural Systems, 12 (3), 331–350, doi:10.1080/net.12.3.331.350.
Appendix
Optimal estimator
In order to relate the internal representation of numerosity to behavioral results, we need models for decision making. These models must connect the analog quantity n resulting from the proposed noisy model with a decision regarding the specific task. For the same–different task and the larger–smaller task, both requiring a binary decision, we used the receiver operator characteristic (ROC; Chang, 2010; Fawcett, 2006) from signal-detection theory to obtain the optimal parameters for the estimator. Each parameter setup defines one point in the ROC space, which is defined as the space spanned by the true positive rate and the false positive rate. The higher the true positive rate and simultaneously smaller the false positive rate, the better the parametrization of the classifier. This is equivalent to maximizing the area under the curve defined by the classifier in the ROC space. 
In the same–different task, the classifier is defined as  where lb defines the left bound and d the length of the detection interval.  
In the larger–smaller task, the classifier is defined as  where t defines the threshold to distinguish between smaller and larger.  
In both cases the optimal parameters of the estimator were determined in order to maximize the ROC on the data set and one fixed reference number Ntuned. The parameters of the detectors can be found in Table A4. The optimal estimators on data set TR were then used to obtain the statistics on data set R of the behavioral tasks shown as squares, triangles, and rhombuses in Figure 11
Fitting functions
We fitted the behavior of the estimators to a continuous function which is dependent on the internal Weber fraction w as described by Piazza et al. (2004). This fitting method was also used by Stoianov and Zorzi (2012) to analyze their model such that the results are obviously comparable. However, we obtained the conditional relative frequencies of the response of the optimal estimators Ddifferent and Dlarger given the true numerosities Ntrue. For the same–different task we fitted these data points to the function  where erf is the standard error function, c controls the internal representation of the reference numerosity Ntuned, δ controls the variance of the corresponding probability density function, and w is the internal Weber fraction.  
For the larger–smaller task we fitted the results of the optimal estimator to another function  with the functions and parameters as just described. In order to obtain representative fits for the number-discrimination ability (Piazza et al., 2004), we only used numerosities which are close to the reference numerosity for the fit—i.e., we used all N with the ratio Image not available In both cases the parameters were estimated by a standard least-squares minimization of the residual. The fitted curves obtained from data set R are illustrated as continuous lines in Figure 11. The respective parameters can be found in Table A3.  
Noise analysis
In order to evaluate the noise behavior of the input unit and the linear filter units with respect to their neural plausibility, the peak SNR (PSNR) is derived from theory as described in the following. The input signal is a function l : [−1, 1] × [−1, 1] → [0, 1] with amplitude A(l) = 1, such that the PSNR becomes  The implementation of the derivatives strongly depends on the discretization and thus on the step sizes Δx and Δy. We thus can derive the differential quotient  The upper bound for ly can be derived analogously. The second derivative in the x-direction is bounded by  The upper bounds for lyy and lxy can be derived analogously. The linear filter units are computed by a convolution operation with a Gaussian function gImage not available such that we need an upper bound for the filter output. Using Young's inequality, we get for lx The peak-to-peak signal amplitude is then twice the maximum norm of the respective filter unit—i.e.,  
Image not available  
The PSNR then becomes  The PSNR formulas for all signals are summarized in Table A1.  
Table A1
 
PSNR and SNR. Notes: The PSNR and SNR functions and values for the input and linear filter stages are shown for the theoretical setup and the three data sets TR, R, and RC.
Table A1
 
PSNR and SNR. Notes: The PSNR and SNR functions and values for the input and linear filter stages are shown for the theoretical setup and the three data sets TR, R, and RC.
In contrast to the PSNR, which can be derived from theory easily, we also consider the SNR of the data sets used for the analysis with respect to the Weber fraction. The SNR is determined by the expected integral of the squared signal over each data set. The SNR for the function l and all images in data set R is given by  The values for the different signal types and the different data sets can also be found in Table A1.  
In order to relate the given SNR values with information encoded by neurons, we determine the rate of a uniform quantizer for the given SNR values. As a good approximation we can use the “6-dB-per-bit rule” (Gray & Neuhoff, 1998), such that the rate R is given approximately by  for a signal l.  
We went one step further and analyzed various noise-parameter combinations to identify suitable noise parameters. The resulting Weber fractions on the rectangle data set (R) are illustrated in Figure A1. In both tasks we find a similar distribution of Weber fractions. The only difference can be observed for high noise levels in the upper right corner of the right illustrations of Figure A1. This could be caused by numerical instability of the fitting algorithm, which relies on the resulting error in the numerosity quantity obtained from the model. However, the Weber fraction for these noise parameters is beyond the reported Weber fractions for humans (children ∼ 0.3). In both tasks the distribution in the vertical direction seems to have a discontinuity for lower linear noise σlin. This discontinuity is a result of the noncontinuous luminance function in the data set. The standard deviation of the Gaussian filter is not sufficient to compensate the discontinuity in the stimulus signal function. The regularity required for the model computation (i.e., differentiability) is not guaranteed. This implies that the input noise stabilizes the system to a certain degree. Changing the standard deviation of the Gaussian filter should move the discontinuity in the Weber fraction distribution in Figure A1; in particular, an increase of the standard deviation shifts this line toward the bottom of the illustration—i.e., toward a smaller σin value. 
Figure A1
 
The distribution of Weber fractions over various noise combinations of input noise σin and linear filter output noise σlin is illustrated for (a) the same–different task and (b) the larger–smaller task. The distributions are shown for σlin ∈ [0, 1] (left) and σlin ∈ [1, 5] (right). The Weber fractions are a result of the evaluation of data set R with the parameters for the optimal estimators obtained from data set TR. The black line shows a constant Weber fraction of w = 0.15 within the shown distributions. The red and the blue circle highlight the parameter instantiations used for Figure A2.
Figure A1
 
The distribution of Weber fractions over various noise combinations of input noise σin and linear filter output noise σlin is illustrated for (a) the same–different task and (b) the larger–smaller task. The distributions are shown for σlin ∈ [0, 1] (left) and σlin ∈ [1, 5] (right). The Weber fractions are a result of the evaluation of data set R with the parameters for the optimal estimators obtained from data set TR. The black line shows a constant Weber fraction of w = 0.15 within the shown distributions. The red and the blue circle highlight the parameter instantiations used for Figure A2.
We also highlighted a height line for a constant Weber fraction w = 0.15 to demonstrate that various parameter combinations can result in the same fraction value. The qualitative difference between two cases (red and blue circles) is illustrated for the larger–smaller task in Figure A2. A smaller noise parameter at the linear filters in combination with higher noise at the input has quantitatively the same overall Weber fraction, but qualitatively it does not show the same reported log-normally distributed noise behavior for all numbers (see Figure A2a). This supports the previous suggestion that the noise at the linear filter outputs which are fed into the multiplication is the essential reason for the log-normal behavior of the model, which is in line with the literature (Buzsáki & Mizuseki, 2014). 
Figure A2
 
The response of the optimal estimator in the case of the larger–smaller task for different noise-parameter combinations (σin, σlin), on a log scale. The noise-parameter combinations are (a) (0.8, 0) and (b) (0.4, 2), resulting in similar overall Weber fractions of (a) w = 0.172 and (b) w = 0.169.
Figure A2
 
The response of the optimal estimator in the case of the larger–smaller task for different noise-parameter combinations (σin, σlin), on a log scale. The noise-parameter combinations are (a) (0.8, 0) and (b) (0.4, 2), resulting in similar overall Weber fractions of (a) w = 0.172 and (b) w = 0.169.
Model comparison
On the general level, our proposed model differs from other image-based models (Cappelletti et al., 2014; Dakin et al., 2011; Dehaene & Changeux, 1993; Morgan et al., 2014; Stoianov & Zorzi, 2012) in the motivation behind its design. The central motivation for our model is the idea to find a sound mathematical basis for the provision of the invariance properties required by an ideal numerosity mechanism. As a second step, we then considered how these ideal principles could be implemented and approximated by neurobiologically plausible mechanisms. The other models have different goals or are based on a different type of reasoning. The model of Dehaene and Changeux (1993) was an early model that tried to model the image-processing aspect to a certain degree. However, it is only a one-dimensional model and thus cannot deal with two-dimensional shapes and the associated invariance properties. We therefore will not include it in the following detailed comparison. The contrast energy model (Dakin et al., 2011; Morgan et al., 2014) is motivated by empirical observations in psychophysical experiments which suggested the possibility of a close connection between numerosity and density (Durgin, 2008). The actual model has then been derived from considerations of how a density mechanism can be provided by use of established filter-based texture computations (Dakin et al., 2011; Morgan et al., 2014). The model of Stoianov and Zorzi (2012) is based on a deep-learning neural network. It is then abstracted by a computational analysis to a spatial filter model with point nonlinearities (Cappelletti et al., 2014; Stoianov & Zorzi, 2012). This model is henceforth designated the network model. 
To compare the other models with our model there exist in principle two different strategies. One is to find out the invariance properties which underlie the other models and to compare them to the invariance properties of our model. The other strategy is to map the models to a common framework and to compare the model components within this framework. In the model comparison provided in the Discussion we made use of the first strategy. Here we will pursue the second strategy, the common-framework approach. 
In Table A2 the three models are split into their operations at different stages or layers. From this representation it becomes apparent that all three models use similar basic computations. They have a similar first layer, consisting of linear filter operations (derivatives and low-pass filtering) and the spatial pooling of local luminance signals into an aggregated cumulative area. The contrast energy model and the network model then use a standard nonlinear transfer function (a sigmoid function and an absolute-value function). In contrast, our proposed model requires a nonlinear AND-like (multiplicative) interaction. (As described in the main text, this could be realized in neural hardware by a variety of mechanisms.) 
Table A2
 
Model comparison by operations. Notes: The different models (left to right) are split into their basic operations (top to bottom). The operations are split into layers, with a linear operation followed by a nonlinear one (NL). The contrast energy model is by Dakin et al. (2011); the network model is by Stoianov and Zorzi (2012).
Table A2
 
Model comparison by operations. Notes: The different models (left to right) are split into their basic operations (top to bottom). The operations are split into layers, with a linear operation followed by a nonlinear one (NL). The contrast energy model is by Dakin et al. (2011); the network model is by Stoianov and Zorzi (2012).
In the second layer, the contrast energy model and the network model perform a spatial integration (the first a global and the second a local integration). Our model also performs such a spatial integration, but only as the last processing step on its top layer. On the second layer, our model performs a linear combination of different local features (different terms of the curvature computation). This combination of different local features is one distinction from the other models, which both use only one type of local spatial feature (a Laplacian band-pass feature or a Gaussian low-pass feature). 
The contrast energy model has a ratio operation between the two aggregated contrast energy values as its last operation. However, this operation is considered only for the incorporation of a moderate bias term for the influence of patch size (Dakin et al., 2011), whereas the basic numerosity variable is argued to be provided by the high-frequency filter output (Dakin et al., 2011; Morgan et al., 2014). The network model ends with a linear weighting over the corrected Gaussian filter outputs of the second layer by a linear classifier. 
In the following, we will try to compare the models with respect to their general behavior. This will require making some simplifying approximations, but we think that the basic trend will be similar to the behavior of the complete models. For the contrast energy model, the authors suggest that the core computation is the spatial pooling of the local energy from a Laplacian high-frequency filter (Dakin et al., 2011; Morgan et al., 2014). The main argument is to use this to measure the amount of contour, since adding more objects to an image amounts to adding more contour (Morgan et al., 2014). For a given type of element—e.g., for squares of some fixed size—the estimate works perfectly: Changing the number of elements will proportionally change the pooled filter output. But how does this operation behave with respect to the invariance properties, i.e., if we change the shape of the elements? For some changes the influence will be relatively small, particularly if we only use concave elements, as is often done in numerosity experiments. For example, if we replace the squares with disks of the same area, the difference in contour length will be only about 10%. If we also use concave elements, differences can become larger. For the L-shaped element shown in Figure A3.6, the difference will become as large as 113% in terms of number units (Figure A4). The differences will become even larger if we also consider elements with different size. For example, if we compare small squares of side length d/4 with large squares of side length d, a set of N large squares will appear to have the same numerosity as a set of 4N small squares. Thus the contrast energy model makes clear predictions about systematic deviations from the ideal invariance properties. How far these are also present in human bias terms remains to be determined (see Discussion). 
Figure A3
 
Data set of various stimuli with convex and concave objects with constant cumulative area A (in pixels) and numerosity N. All stimuli are binary images of size 100 × 100 pixels.
Figure A3
 
Data set of various stimuli with convex and concave objects with constant cumulative area A (in pixels) and numerosity N. All stimuli are binary images of size 100 × 100 pixels.
Figure A4
 
Responses of different models to stimuli illustrated in Figures A3 and A7. (a) The relative responses (difference from reference stimulus 1) to the convex and concave stimuli from Figure A3. (b) The relative responses (difference from reference stimulus A = 2,500) to stimuli with different cumulative area A (compare Figure A7a). The data set in Figure A7b was used to determine the mean difference in model response per numerosity. This value was used to determine ΔN. The dotted black lines mark the absolute difference 1 in numerosity with respect to the reference stimulus.
Figure A4
 
Responses of different models to stimuli illustrated in Figures A3 and A7. (a) The relative responses (difference from reference stimulus 1) to the convex and concave stimuli from Figure A3. (b) The relative responses (difference from reference stimulus A = 2,500) to stimuli with different cumulative area A (compare Figure A7a). The data set in Figure A7b was used to determine the mean difference in model response per numerosity. This value was used to determine ΔN. The dotted black lines mark the absolute difference 1 in numerosity with respect to the reference stimulus.
The analysis of the network model is somewhat more complicated. We make the following simplifying assumptions, which we think are valid as far as systematic deviations from invariance are considered: First, the second-layer operation and the final weighting by the classifier are both linear. The weighting of the second-layer units at the different positions by the classifier can be expected to be similar, since these units all have basically the same status with respect to the computation of numerosity (any systematic differences between units at different positions would induce position-dependent biases). We thus assume that these two steps can be combined into one global spatial integration Σ(m,n)∈XOmn. Here Oij are the local Gaussian-filter outputs after the sigmoid-point nonlinearity in the first layer and X is the set of indices of the discretized domain of the input image. We further assume that the aggregated luminance can also be rewritten into such a spatial integration as c = Σ(m,n)∈XkI mn. For this, we linearize the logarithm (the argument can assume values only between 1 and 2) as 
Image not available  
The final sum can then be written as Σ(m,n)∈X(OmnkImn)—i.e., it is a sum over a difference image between the original image and a nonlinear low-pass-filtered version of it. This analysis suggests that the constant k should be an absolute constant, which does not depend on further parameters, and that it should be chosen in a way to avoid contributions from the interior object area. This is necessary because otherwise there would remain a systematic dependency of the numerosity estimate on the object area. Formally, k depends on the normalization constant cmax. This is an absolute constant within an experiment, but it is not quite clear how a subject can have knowledge of it or what this implies for the relation between different types of experiments. In the following, we consider k as an absolute constant which is chosen to produce a zero difference throughout all areas of constant luminance of the input image. We can then analyze the contributions from the remaining nonzero areas of the difference image. For a class of simple examples, patterns with straight edges and right-angled corners only (compare Figure A3.5 through A3.10), we know the correct solution from the Gauss–Bonnet theorem and from the Euler formula: The desired invariant can then be computed by simply summing up the number of (signed) corners as n = ¼Σvi, where each corner i contributes +1 or −1. For this special case the difference image should thus ideally only generate contributions from the corners, and no contributions from the straight borders. This is also evident from considering the case of enlarging such a figure, since then the number of corners remains constant but the length of the straight contours is increased. Since the low-pass filtering smooths the corners, the network model will indeed provide the desired contribution from the corners. However, there seems also to be a nonvanishing contribution from the straight borders, such that there will be a dependency on the size as well as the shape of the elements. While the shape dependency is moderate for convex shapes, it becomes larger for concave shapes, since for those the total contour length is significantly larger (see Figures A3.6 and A4). 
In conclusion, both the contrast energy model and the network model can be seen to make use of similar basic local features which result from some version of nonlinear band-pass filtering. This operation produces the basic local features (explicitly represented by the rectified band-pass features in the case of the contrast energy model and implicitly represented by the subtraction of the cumulative area from the low-pass-filtered features in the network model). These local features are then spatially pooled (globally in one step in the contrast energy model and in two steps—first the second Gaussian low-pass filter and then the linear classifier—in the network model). Mapped to this type of architecture, the two models can be directly compared to our model, which can also be seen as consisting of the computation of nonlinear local features and a subsequent spatial pooling. From the mathematical basis of our model we know which local parts of the input image have to play which role, if we want to achieve the desired invariance. We know, for example, that all constant image areas should not generate a systematic contribution. This enforces the mutual cancellation of the low-pass responses and the per-sample contribution of the cumulative area measurement in the interior object areas in the network model. If this is violated, a systematic influence of the cumulative area is unavoidable. We also know that the contributions from the contour regions (in case of binary images) should systematically vary with the contour curvature. In particular, straight contours should not generate any contribution, since otherwise there will result a systematic dependency on the total contour length and on the size of the elements. This is a problem for both the contrast energy model and the network model, since both produce nonvanishing contributions from straight contours. It should further be noted that it is generally not possible to trade off the false contributions from one class (say area) against the false contributions of the other class (contour), since there exists no shape-independent relation between the two (see the arguments in the main text regarding the isoperimetric quotient). 
Invariance properties of binary objects
Assuming the special case of binary images or objects allows for the derivation of problem-specific computational principles. Here we describe the general rules for the invariant computation of numerosity for this particular signal class. The binary setup can be interpreted from different points of view. In the following we consider two possible interpretations. 
(a) We can see it as an inherently one-dimensional problem. The objects are bounded two-dimensional subsets of the xy-plane such that they can be represented by their one-dimensional boundary curve (compare Figure A5). In this case the one-dimensional counterpart of the Gauss–Bonnet theorem is the following standard corollary in differential geometry: 
Figure A5
 
Three objects illustrated by their one-dimensional boundary curves. (a) The ratio of squared perimeter and area increases from left to right. (b) The positive curvature (blue) and the negative curvature (red) of the boundary curve. (c) The area is constant for all objects, and the frequency of the sinusoidal grating along the boundary of a circle increases from left to right. The higher the frequency, the larger the contour length, resulting in an decreasing isoperimetric quotient from left to right.
Figure A5
 
Three objects illustrated by their one-dimensional boundary curves. (a) The ratio of squared perimeter and area increases from left to right. (b) The positive curvature (blue) and the negative curvature (red) of the boundary curve. (c) The area is constant for all objects, and the frequency of the sinusoidal grating along the boundary of a circle increases from left to right. The higher the frequency, the larger the contour length, resulting in an decreasing isoperimetric quotient from left to right.
Corollary 3
Let Γ be a closed, regular plane curve. Then we have  where κ is the curvature of the plane curve and n is an integer called the rotation index of the curve.  
As the rotation index is always 1 for simply connected objects and the integral operator is linear, the integral over the disjunction of multiple boundary curves results in their number. The simple one-dimensional variant of the Gauss–Bonnet theorem thus tells us that we have to integrate the curvature of the boundary curve(s). In contrast to the isoperimetric quotient, the integral over the curvature is independent of the shape. In Figure A5a this quotient decreases from left to right, whereas the integral over the curvature remains constant. This is due to the fact that the increase of positive curvature from left to right in Figure A5b is compensated by an additional contribution of negative curvature. 
If we assume a piecewise constant boundary, the integral over the curvature becomes the discrete sum over the arcs. In particular, this setup includes the special case of objects having right-angled corners and otherwise vertical or horizontal straight boundary segments (see Figure A3.5 through A3.10 for examples). The corners can be divided into two classes: We have external corners, where the object area covers one quarter within a circular neighborhood. These corners have an arc of π/2. The corners where the object area covers three quarters build the second class, the internal corners, with a contribution of −π/2. If we divide both sides of the corollary by 2π, we obtain a standard solution to determine the number of simply connected binary objects in computer vision (Umbaugh, 2005). The number of objects then becomes the number of external rectangular corners minus the number of internal rectangular corners divided by 4. 
(b) We can consider the objects as two-dimensional surfaces such that the approximate invariance requires no contribution from the interior area of the object, no contribution from straight edges, and a contribution from the remaining regions. The following approach, which matches all criteria in the binary-image case, is motivated by the detection of the contour line of the region with the aid of the Laplace operator. If the contribution is zero on constant luminance regions and the computed quantity at the boundary allows a mapping to the curvature of the boundary curve, the integration over the whole domain should result in an estimate for numerosity. An easy way to detect the contour is to determine the zero-crossing line of the Laplacian filtered image. In order to guarantee the differentiability for the Laplace operator, the binary image must be filtered by an appropriate filter function. One standard filter function is the Gaussian function. This function is not optimal for further consideration, as its support is infinite. We assume a filter function g with compact circular support Ω around the origin. Furthermore, the Laplace of the function g is positive in an inner disk Ωin ⊂ Ω and negative in Ωout = Ω/Ωin. The integral of Δg over Ω is assumed to be zero. Thus, the Laplace of the convolution with the filter function yields zero for constant luminance regions. Furthermore, if we consider a straight edge with a length greater than or equal to the diameter of Ω, the line integral in the orthogonal (to the edge) direction of the filter output becomes zero for all boundary points where the neighborhood Ω contains the straight edge only. Thus, the integral over the whole domain of the Laplacian filter output already has two of the three desired properties: It causes no contributions at constant luminance regions and no contributions at straight edges. That the spatial integration of the filter output does not result in the desired estimate for numerosity can be seen easily by the following equation:    
Note that the following solution does not depend on the Laplace operator. We can replace Δg by an arbitrary filter function h which has the same characteristic behavior on Ω = Ωin ∪ Ωout. In order to obtain contributions from the curved regions, a nonlinear function F : ℜ → ℜ is introduced. This function must be odd symmetric such that the zero contribution on constant regions and on straight edges is still preserved. The problem of the estimation of numerosity n then becomes the problem of finding an appropriate combination of a nonlinear function F and a filter function h such that    
Again, the success of this approach is due to its accuracy in approximating the curvature of the boundary curve. We assume that the boundary curve is a piecewise constant line with a minimum distance of ε between vertices and that the diameter of the region Ω is chosen smaller than ε (i.e., the radius δ < ε/2). In this particular case we know that integrating the linear filter output over the neighborhood with radius δ around a vertex results in zero. At an edge, the integration domains of positive and negative contributions, which sum up to zero, are equal in size. At a vertex the size of these integration domains is no longer equal, but the overall integral remains zero. Thus a monotonic nonlinear function F fulfilling the previously formulated constraints is sufficient to guarantee a contribution from the vertices. If this output is proportional to the arc, the integration over the whole domain results in an estimate for numerosity. Figure A6 illustrates one choice of h and F which is sufficient to extract the desired information. We also applied the filter function from Figure A6a and the modified sigmoid function to the stimuli in Figure A7b, producing an output of 133.73, 261.58, 399.45, and 533.46, from left to right. Relative to the output of the first stimulus with one object, the responses become 1.00, 1.95, 2.98, and 3.98. This simple choice of h and F thus results in the desired proportionality to numerosity. 
Figure A6
 
(a) The used filter function h. The radius of the inner disk Ωin is 4 and the radius of the whole domain Ω is 7.24. The filter function was determined by a cubic spline interpolation such that its integral over the whole domain is zero. (b) An example binary stimulus. (c) The linear filter output l * h. The integral over the whole domain in (c) is approximately −0.015. (d) A modified sigmoid function F(x) = −2[1/(1 − ex) − 1/2] is applied to the filter output. The area effect of the positive and negative contributions at the corners becomes apparent. This results in an integral over the whole domain in (d) of approximately 131.273.
Figure A6
 
(a) The used filter function h. The radius of the inner disk Ωin is 4 and the radius of the whole domain Ω is 7.24. The filter function was determined by a cubic spline interpolation such that its integral over the whole domain is zero. (b) An example binary stimulus. (c) The linear filter output l * h. The integral over the whole domain in (c) is approximately −0.015. (d) A modified sigmoid function F(x) = −2[1/(1 − ex) − 1/2] is applied to the filter output. The area effect of the positive and negative contributions at the corners becomes apparent. This results in an integral over the whole domain in (d) of approximately 131.273.
Figure A7
 
Two data sets of various stimuli. (a) One rectangular object with increasing cumulative area A. (b) Multiple rectangular objects with constant cumulative area A. All stimuli are binary images of size 100 × 100 pixels; cumulative area A is in pixels.
Figure A7
 
Two data sets of various stimuli. (a) One rectangular object with increasing cumulative area A. (b) Multiple rectangular objects with constant cumulative area A. All stimuli are binary images of size 100 × 100 pixels; cumulative area A is in pixels.
In conclusion, for binary images, approximating the curvature of the boundary curve and integrating this quantity results in an estimate for numerosity. Furthermore, the fundamental principle which allows the estimation of numerosity from binary images is the invariance property provided by the Gauss–Bonnet theorem. 
Data sets
We generated different data sets using the method described by Stoianov and Zorzi (2012) adapted to image size. The 100 × 100–pixel binary images contained up to 32 randomly placed nonoverlapping objects with variable surface area and shape. The cumulative area of the objects was varied from 288 to 2,304 pixels with a step of 288, resulting in eight levels of cumulative area. For each tuple of number and cumulative area, 200 images were computed. The data sets thus consist of 51,200 100 × 100 binary images, with 32 different numbers and eight levels of cumulative area. The data sets are distinguished by the kind of shape: rectangles (R), rectangles with smoothed corners (RC), circles (C). We generated a training data set (TR) consisting of rectangles to obtain the parameters for the optimal estimators. 
Furthermore, we generated additional data sets with rectangular concave objects differing in their ratio q = A/L2, where A is the area of a single object and L is the length of its contour. The cumulative area and the number are controlled as described for data sets TR, R, RC, and C. Given the ratio qsquare = 1/16, which is the ratio of an arbitrary square, the data sets are generated with ratios q = 90% qsquare (C1), 80% qsquare (C2), and 70% qsquare (C3). 
We also generated data sets with randomly shaped objects by adding a randomly determined linear combination of sinus functions with different frequencies to the radius of circles (Zahn & Roskies, 1972). A single object O ⊂ ℜ2 around the origin with radius r0 is thus defined by  where θ is randomly chosen from the interval [0, 2π], and d ∈ ℕ determines the maximum frequency added to the boundary. The weights ωk are drawn randomly from a unique distribution over [0, 1]. Subsequently, they are scaled such that their sum becomes 50% r0. We generated data sets with different image sizes containing up to 32 randomly placed objects and two levels of cumulative area, 288 and 576 pixels, for images of size 100 × 100 pixels. The cumulative area was scaled with respect to image size. The data sets covered randomly shaped objects (d = 5) in images of size 100 × 100 (RA), 200× 200 (RA200), and 400 × 400 (RA400) pixels. Sample images for all data sets are illustrated in Figure A8. All data sets can be found online (http://www.informatik.uni-bremen.de/cog_neuroinf/en/datasets).  
Figure A8
 
Sample images from the data sets: (a) R, (b) CR, (c) C, (d) C1, (e) C2, (f) C3, (g) RA, (h), RA200, (i) RA400.
Figure A8
 
Sample images from the data sets: (a) R, (b) CR, (c) C, (d) C1, (e) C2, (f) C3, (g) RA, (h), RA200, (i) RA400.
Approximation errors
In this section we discuss different error causes in the model computation that can have an effect on the Weber fraction. This analysis is done by the model outputs in the various data sets, by the parameters of the optimal estimators, by distinction of cumulative areas, and by outliers in the overall Weber evaluation (see Table 1). 
Figures A9 and A10 show a comparison of the outputs of the noise-free and noise-affected models (Figure A9: datasets TR, R, RC, C; Figure A10: datasets C1, C2, C3, RA, RA200, RA400). This comparison also comprises the full number range up to the maximum of 32 tested in this study. For large numbers the standard deviation increases, which can have multiple reasons. First, in our data sets the mean object size decreases with a higher number of objects such that the Gaussian filter could just smooth them below the threshold. Second, with a higher number of objects the probability increases that two objects will be so close that they are counted as one. In both cases, we expect a smaller model output, which is confirmed by Figure A9, where the mean model output (blue line) falls below identity (black line) for larger numbers of objects in all data sets. The increase of the standard deviations is in line with the increased probability that one of the previously described cases occurs for larger numbers. We can also observe larger deviations from identity for more curved objects (see top line from left to right in Figure A9). This is due to the higher degree of regularity required for the computation of the discretized derivatives—i.e., a larger σ is required. The introduction of noise to the system stabilizes the mean responses of the model but also causes an increase in standard deviation, as can be seen in the bottom row of Figure A9
Figure A9
 
Mean model output with standard deviation plotted against the true number in the stimuli for the zero-noise case (σin, σlin) = (0, 0) (top, blue) and the noise case (σin, σlin) = (0.4, 2) (bottom, red) for different data sets: (a) TR, (b) R, (c) RC, (d) C.
Figure A9
 
Mean model output with standard deviation plotted against the true number in the stimuli for the zero-noise case (σin, σlin) = (0, 0) (top, blue) and the noise case (σin, σlin) = (0.4, 2) (bottom, red) for different data sets: (a) TR, (b) R, (c) RC, (d) C.
Figure A10
 
Mean model output with standard deviation plotted against the true number in the stimuli for the zero-noise case (σin, σlin) = (0, 0) (top, blue) and the noise case (σin, σlin) = (0.4, 2) (bottom, red) for different data sets: (a) C1, (b) C2, (c) C3, (d) RA, (e) RA200, (f) RA400.
Figure A10
 
Mean model output with standard deviation plotted against the true number in the stimuli for the zero-noise case (σin, σlin) = (0, 0) (top, blue) and the noise case (σin, σlin) = (0.4, 2) (bottom, red) for different data sets: (a) C1, (b) C2, (c) C3, (d) RA, (e) RA200, (f) RA400.
As a second analysis, we compare Weber fractions regarding different reference numbers. From rectangular to circular objects (datasets R, RC, and C) we found slightly different but similar Weber fractions (see Table A3). In the same–different task we observe better numerosity discrimination in data set RC for the numbers 8 and 16 and worse discrimination for the number 16. These differences may be explained by the optimal estimator parameters in Table A4. The parameters for the evaluation were obtained from data set TR and describe an optimum of the ROC. Comparing these parameters with the parameters obtained from data sets R and RC should provide explanations for the observed differences. The parameters are determined by maximizing the true positive rate and minimizing the false negative rate. For number 8 in the same–different task, the parameters of R nearly match the parameters of the training set, such that the Weber fraction of R is a good baseline. Compared to the interval of the training set, the interval of RC is a subset with a smaller length. This directly implies that the used parameter setting differs from the optimum of the ROC in data set RC. As the interval is a subset, the true positive rate is at least the same or higher. This can be interpreted as a constant or a lower valley in the curve in Figure 11c, for example. If the true positive rate is constant, a better Weber fraction would emerge only if the false positive rate is smaller. For the curve in Figure 11c this would imply that the values next to the tuned numerosity 8 must be higher—i.e., the curve becomes steeper. If the true positive rate is higher, the false positive rate can stay constant or decrease to explain the observed effect of a smaller Weber fraction. But a smaller false positive rate is not reasonable in this setup, because the interval of the estimator is larger than the optimal interval for data set RC. This implies that in the worst case, more different stimuli are classified as the same. The most reasonable case is that the true positive rate increased and the false positive rate is constant. For number 12 the intervals are similar for all data sets, resulting in similar Weber fractions. For number 16 we find a similar parameter setup compared to number 8 (the optimal interval of RC is a subset of estimator interval TR), but the Weber fraction is worse for RC. Here it is reasonable that the false positive rate increased, resulting in a worse Weber fraction. 
Table A3
 
Parameters of the continuous data-fitting functions. Notes: In all cases the noise levels were σin = 0.4 and σlin = 2. The parameters of the optimal estimators were obtained from data set TR (see Table A4).
Table A3
 
Parameters of the continuous data-fitting functions. Notes: In all cases the noise levels were σin = 0.4 and σlin = 2. The parameters of the optimal estimators were obtained from data set TR (see Table A4).
Table A4
 
Parameters of optimal estimators. Notes: In all cases the noise levels were σin = 0.4 and σlin = 2.
Table A4
 
Parameters of optimal estimators. Notes: In all cases the noise levels were σin = 0.4 and σlin = 2.
In the larger–smaller task the same arguments hold true for number 8, but numbers 12 and 16 are more complicated. The threshold of RC does not differ much from the estimator threshold (TR) for number 16, but the difference in the Weber fraction is quite high. In this task the similar threshold also implies a constant or higher true positive rate, but here the distinction is done between two sets both consisting of multiple numbers. Even if the true positive rate stays constant, which is reasonable for number 16, the distribution over the whole set of larger numbers can change dramatically. This could result in a worse Weber fraction. Analogous arguments apply to data set C. 
As a next step we investigated invariance to cumulative area, as can be seen in Figure A11. The increase in Weber fraction is a result of the decreased mean distance between the single objects. Increasing the minimum distance between the objects in the data set would resolve this issue. 
Figure A11
 
The overall discrimination-ability curves of the optimal estimator in the case of (a) the same–different task and (b) the larger–smaller task for different cumulative areas and fixed (σin, σlin) = (0.4, 2), shown on a log scale for data set R. The Weber fractions with increasing cumulative area are (a) (0.135, 0.1438, 0.154, 0.178) and (b) (0.137, 0.133, 0.153, 0.178).
Figure A11
 
The overall discrimination-ability curves of the optimal estimator in the case of (a) the same–different task and (b) the larger–smaller task for different cumulative areas and fixed (σin, σlin) = (0.4, 2), shown on a log scale for data set R. The Weber fractions with increasing cumulative area are (a) (0.135, 0.1438, 0.154, 0.178) and (b) (0.137, 0.133, 0.153, 0.178).
In Table 1 we find strong deviation for data set RA. High frequencies were used to generate the random shape of the objects on a small image size. If the frequency is too high, the discrete approximation of the line integral in the numerosity computation causes high errors, which explain the high Weber fraction for data set RA. 
We are thus able to identify the following important parameters which can cause model errors: 
  •  
    Regularity: The approximation of the differential operators applied to the luminance function and the boundary curve must be sufficient. This can be guaranteed by a sufficiently high σ in the Gaussian filter function. High regularity is necessary to allow arbitrary shape variations.
  •  
    Minimum distance: There is a strong dependence between the minimum distance between objects and the standard deviation σ. The higher the σ value, the higher the minimum distance must be to guarantee a clear distinction between separated objects.
  •  
    Resolution: The resolution can restrict the kind of stimuli with respect to combinations of cumulative area, number, and shape. Assume the resolution is small and the cumulative area is high; increasing the number increases the probability that two objects are nearby. The standard deviation σ is then a trade-off between fulfilling regularity and satisfying minimum distance. In both cases, errors occur.
To summarize, with the resolution and simultaneously the standard deviation σ increased in combination with a controlled minimum distance between objects, the model becomes a perfect enumerator. 
Figure 1
 
A configuration of three-dimensional simply connected objects. The shown objects (mushrooms) differ in their position, size, and shape. The main question is how the numerosity of objects can be determined from such a configuration. (Note that in a strict manner, this illustration gives only an idea about the three-dimensional case since projection to a two-dimensional image amounts to estimation via sensors). There exist multiple possible approaches for how the numerosity can be estimated. If time is not limited, one may sequentially count the objects. For the case of up to four objects, the number can be determined immediately in a process named subitizing. We want to deal with the full range of numbers, and we therefore investigate the approximate number sense, which estimates numerosity under a strict time limitation.
Figure 1
 
A configuration of three-dimensional simply connected objects. The shown objects (mushrooms) differ in their position, size, and shape. The main question is how the numerosity of objects can be determined from such a configuration. (Note that in a strict manner, this illustration gives only an idea about the three-dimensional case since projection to a two-dimensional image amounts to estimation via sensors). There exist multiple possible approaches for how the numerosity can be estimated. If time is not limited, one may sequentially count the objects. For the case of up to four objects, the number can be determined immediately in a process named subitizing. We want to deal with the full range of numbers, and we therefore investigate the approximate number sense, which estimates numerosity under a strict time limitation.
Figure 2
 
The different kinds of invariance properties required for numerosity estimation are illustrated in the two-dimensional case. The right column shows the combination of different operations, with increasing complexity from top to bottom. The kinds of operations are partitioned in two groups: geometric invariances and topological invariances. Geometric invariances are defined with respect to operations which change position, size, and orientation of the objects. Topological invariances are defined with respect to operations which change the shape of the object (formally defined with respect to homeomorphisms). Each topological invariance is automatically a geometric invariance.
Figure 2
 
The different kinds of invariance properties required for numerosity estimation are illustrated in the two-dimensional case. The right column shows the combination of different operations, with increasing complexity from top to bottom. The kinds of operations are partitioned in two groups: geometric invariances and topological invariances. Geometric invariances are defined with respect to operations which change position, size, and orientation of the objects. Topological invariances are defined with respect to operations which change the shape of the object (formally defined with respect to homeomorphisms). Each topological invariance is automatically a geometric invariance.
Figure 3
 
(a) The surface of a three-dimensional sphere. (b) The sphere in (a) is transformed via homeomorphisms into a smaller sphere (top), a smaller sphere with a dent (middle), and a smaller sphere with a bulge (bottom). (c) The Gaussian curvature of the 3-D objects in (b), shown qualitatively by coloring the surface by the kind of curvature. The elliptic parts of the surface are blue, the hyperbolic parts are red, and the parabolic parts are white. The color saturation shows the amount of curvature contributed to the integral in the Gauss–Bonnet theorem. This illustrates the meaning of the Gauss–Bonnet theorem. The distribution of the Gaussian curvature of the surface is strongly affected by the applied homeomorphism, but the integral over the whole surface of this curvature is independent of the homeomorphism's effect. (Adapted from Kluth & Zetzsche, 2014.)
Figure 3
 
(a) The surface of a three-dimensional sphere. (b) The sphere in (a) is transformed via homeomorphisms into a smaller sphere (top), a smaller sphere with a dent (middle), and a smaller sphere with a bulge (bottom). (c) The Gaussian curvature of the 3-D objects in (b), shown qualitatively by coloring the surface by the kind of curvature. The elliptic parts of the surface are blue, the hyperbolic parts are red, and the parabolic parts are white. The color saturation shows the amount of curvature contributed to the integral in the Gauss–Bonnet theorem. This illustrates the meaning of the Gauss–Bonnet theorem. The distribution of the Gaussian curvature of the surface is strongly affected by the applied homeomorphism, but the integral over the whole surface of this curvature is independent of the homeomorphism's effect. (Adapted from Kluth & Zetzsche, 2014.)
Figure 4
 
(a) A slightly lightened three-dimensional cube. (b) The luminance function of the image in (a) as a two-dimensional surface embedded in the three-dimensional space. Application of differential operators to the luminance surface becomes numerically unstable because of emerging discontinuities which require the incorporation of additional arcs in the general solution.
Figure 4
 
(a) A slightly lightened three-dimensional cube. (b) The luminance function of the image in (a) as a two-dimensional surface embedded in the three-dimensional space. Application of differential operators to the luminance surface becomes numerically unstable because of emerging discontinuities which require the incorporation of additional arcs in the general solution.
Figure 5
 
A possible architecture of the proposed model and how error is introduced in the system. The information-processing direction is from left to right; each box represents one processing step. The processing steps in which noise is included have a slightly darker background color. The open circle in each box is a placeholder for the input coming from the left direction. Linear operation stages are blue, and nonlinear ones are red. The stimulus is fed in the system as a luminance function l in the input stage (green box). Then the additive normally distributed noise ηinN(0, σin) is added. The convolution with a Gaussian kernel g and application of the differential operator can be interpreted as one linear-filter processing stage—i.e., the convolution with the respective derivative of the Gaussian kernel. The results of the linear filter operation are fed into the next noise-adding unit, where different samples of ηlinN(0, σlin) are added to each input. The following multiplication (GC-Product), summation (GC-Sum), and division (GC-Norm) build the cortical gain-control stage, which results in an estimate of the required curvature quantities. Finally, the curvature quantities are integrated (blue box), resulting in an estimate for the numerosity N.
Figure 5
 
A possible architecture of the proposed model and how error is introduced in the system. The information-processing direction is from left to right; each box represents one processing step. The processing steps in which noise is included have a slightly darker background color. The open circle in each box is a placeholder for the input coming from the left direction. Linear operation stages are blue, and nonlinear ones are red. The stimulus is fed in the system as a luminance function l in the input stage (green box). Then the additive normally distributed noise ηinN(0, σin) is added. The convolution with a Gaussian kernel g and application of the differential operator can be interpreted as one linear-filter processing stage—i.e., the convolution with the respective derivative of the Gaussian kernel. The results of the linear filter operation are fed into the next noise-adding unit, where different samples of ηlinN(0, σlin) are added to each input. The following multiplication (GC-Product), summation (GC-Sum), and division (GC-Norm) build the cortical gain-control stage, which results in an estimate of the required curvature quantities. Finally, the curvature quantities are integrated (blue box), resulting in an estimate for the numerosity N.
Figure 6
 
Various stimuli (size 100 × 100 pixels) and the corresponding responses of the model defined by Equation 9. Stimuli vary with respect to different criteria like (a) cumulative area, (b) number change by morphing, (c) convex and concave shape, (d) contrast of a single object, and large numbers of (e) convex and (f) concave objects.
Figure 6
 
Various stimuli (size 100 × 100 pixels) and the corresponding responses of the model defined by Equation 9. Stimuli vary with respect to different criteria like (a) cumulative area, (b) number change by morphing, (c) convex and concave shape, (d) contrast of a single object, and large numbers of (e) convex and (f) concave objects.
Figure 7
 
Various stimuli (size 100 × 100 pixels) and the corresponding responses of the model defined by Equation 9 (N2: σ = 2Δx; N4: σ = 4Δx; Nt: σ = 2Δx; and the threshold τ is 0.3 times the maximum luminance value). Stimuli vary with respect to different criteria like (a) grating texture, (b) convex and concave object morphing, increasing number of convex and concave (c) binary and (d) grating-textured objects, (e) textured concave objects, and (f) contrast variations of multiple rectangles.
Figure 7
 
Various stimuli (size 100 × 100 pixels) and the corresponding responses of the model defined by Equation 9 (N2: σ = 2Δx; N4: σ = 4Δx; Nt: σ = 2Δx; and the threshold τ is 0.3 times the maximum luminance value). Stimuli vary with respect to different criteria like (a) grating texture, (b) convex and concave object morphing, increasing number of convex and concave (c) binary and (d) grating-textured objects, (e) textured concave objects, and (f) contrast variations of multiple rectangles.
Figure 8
 
Various stimuli of three circles with different numbers of connecting lines and the corresponding model responses (Nk: σ = kΔx, k = 4, 6, 8). The image size is varied from (a) 100 × 100 pixels to (c) 200 × 200 pixels. The relative size of the circles is constant. The absolute width of the connecting lines is also constant, resulting in a decrease in relative width to circle size.
Figure 8
 
Various stimuli of three circles with different numbers of connecting lines and the corresponding model responses (Nk: σ = kΔx, k = 4, 6, 8). The image size is varied from (a) 100 × 100 pixels to (c) 200 × 200 pixels. The relative size of the circles is constant. The absolute width of the connecting lines is also constant, resulting in a decrease in relative width to circle size.
Figure 9
 
Mean model output with standard deviation plotted against the true number in the stimuli for the zero-noise case (σin, σlin) = (0, 0) for different data sets: (a) TR, (b) R, (c) RC, and (d) C.
Figure 9
 
Mean model output with standard deviation plotted against the true number in the stimuli for the zero-noise case (σin, σlin) = (0, 0) for different data sets: (a) TR, (b) R, (c) RC, and (d) C.
Figure 10
 
SNR curves with respect to the standard deviation of the noise for data set R for (a) the input stage and (b) the linear filter stage. The plotted information of (a) and (b) can be found in Table A1. (c) The required mean bit rates to encode the signals of each stage. The noise parameters chosen for the empirical investigation of numerosity-judgment tasks are highlighted with red circles in (a–c).
Figure 10
 
SNR curves with respect to the standard deviation of the noise for data set R for (a) the input stage and (b) the linear filter stage. The plotted information of (a) and (b) can be found in Table A1. (c) The required mean bit rates to encode the signals of each stage. The noise parameters chosen for the empirical investigation of numerosity-judgment tasks are highlighted with red circles in (a–c).
Figure 11
 
Computational results from same–different and larger–smaller tasks where the estimators were trained on data set TR and the evaluation was done on data set R. The noise parameters were σin = 0.4 and σlin = 2, and the corresponding estimator parameters can be found in Table A4. In the left column, the graphs show the relative frequencies of the estimators' response to “different,” and in the right column, the relative frequencies of the response “larger.” In both cases, the relative frequencies are plotted against functions of the true numerosities in the stimulus. The resulting data points from the evaluation process are shown as squares, rhombuses, and triangles. The continuous graphs are the fitted log-normal distributions as described in the Appendix. The graphs are skewed on a linear scale (a, d) and become symmetric on a logarithmic scale (b, e). Plotting the frequencies against the ratio with the reference numerosity Ntuned on a logarithmic scale shows that the graphs for all reference numerosities are approximately covering each other (c, f). Using data points of all reference numerosities in the logarithm of the ratio scale to fit the continuous functions yields the overall numerosity-discrimination ability (black curve in c, f) regarding the previously illustrated reference numerosities.
Figure 11
 
Computational results from same–different and larger–smaller tasks where the estimators were trained on data set TR and the evaluation was done on data set R. The noise parameters were σin = 0.4 and σlin = 2, and the corresponding estimator parameters can be found in Table A4. In the left column, the graphs show the relative frequencies of the estimators' response to “different,” and in the right column, the relative frequencies of the response “larger.” In both cases, the relative frequencies are plotted against functions of the true numerosities in the stimulus. The resulting data points from the evaluation process are shown as squares, rhombuses, and triangles. The continuous graphs are the fitted log-normal distributions as described in the Appendix. The graphs are skewed on a linear scale (a, d) and become symmetric on a logarithmic scale (b, e). Plotting the frequencies against the ratio with the reference numerosity Ntuned on a logarithmic scale shows that the graphs for all reference numerosities are approximately covering each other (c, f). Using data points of all reference numerosities in the logarithm of the ratio scale to fit the continuous functions yields the overall numerosity-discrimination ability (black curve in c, f) regarding the previously illustrated reference numerosities.
Figure A1
 
The distribution of Weber fractions over various noise combinations of input noise σin and linear filter output noise σlin is illustrated for (a) the same–different task and (b) the larger–smaller task. The distributions are shown for σlin ∈ [0, 1] (left) and σlin ∈ [1, 5] (right). The Weber fractions are a result of the evaluation of data set R with the parameters for the optimal estimators obtained from data set TR. The black line shows a constant Weber fraction of w = 0.15 within the shown distributions. The red and the blue circle highlight the parameter instantiations used for Figure A2.
Figure A1
 
The distribution of Weber fractions over various noise combinations of input noise σin and linear filter output noise σlin is illustrated for (a) the same–different task and (b) the larger–smaller task. The distributions are shown for σlin ∈ [0, 1] (left) and σlin ∈ [1, 5] (right). The Weber fractions are a result of the evaluation of data set R with the parameters for the optimal estimators obtained from data set TR. The black line shows a constant Weber fraction of w = 0.15 within the shown distributions. The red and the blue circle highlight the parameter instantiations used for Figure A2.
Figure A2
 
The response of the optimal estimator in the case of the larger–smaller task for different noise-parameter combinations (σin, σlin), on a log scale. The noise-parameter combinations are (a) (0.8, 0) and (b) (0.4, 2), resulting in similar overall Weber fractions of (a) w = 0.172 and (b) w = 0.169.
Figure A2
 
The response of the optimal estimator in the case of the larger–smaller task for different noise-parameter combinations (σin, σlin), on a log scale. The noise-parameter combinations are (a) (0.8, 0) and (b) (0.4, 2), resulting in similar overall Weber fractions of (a) w = 0.172 and (b) w = 0.169.
Figure A3
 
Data set of various stimuli with convex and concave objects with constant cumulative area A (in pixels) and numerosity N. All stimuli are binary images of size 100 × 100 pixels.
Figure A3
 
Data set of various stimuli with convex and concave objects with constant cumulative area A (in pixels) and numerosity N. All stimuli are binary images of size 100 × 100 pixels.
Figure A4
 
Responses of different models to stimuli illustrated in Figures A3 and A7. (a) The relative responses (difference from reference stimulus 1) to the convex and concave stimuli from Figure A3. (b) The relative responses (difference from reference stimulus A = 2,500) to stimuli with different cumulative area A (compare Figure A7a). The data set in Figure A7b was used to determine the mean difference in model response per numerosity. This value was used to determine ΔN. The dotted black lines mark the absolute difference 1 in numerosity with respect to the reference stimulus.
Figure A4
 
Responses of different models to stimuli illustrated in Figures A3 and A7. (a) The relative responses (difference from reference stimulus 1) to the convex and concave stimuli from Figure A3. (b) The relative responses (difference from reference stimulus A = 2,500) to stimuli with different cumulative area A (compare Figure A7a). The data set in Figure A7b was used to determine the mean difference in model response per numerosity. This value was used to determine ΔN. The dotted black lines mark the absolute difference 1 in numerosity with respect to the reference stimulus.
Figure A5
 
Three objects illustrated by their one-dimensional boundary curves. (a) The ratio of squared perimeter and area increases from left to right. (b) The positive curvature (blue) and the negative curvature (red) of the boundary curve. (c) The area is constant for all objects, and the frequency of the sinusoidal grating along the boundary of a circle increases from left to right. The higher the frequency, the larger the contour length, resulting in an decreasing isoperimetric quotient from left to right.
Figure A5
 
Three objects illustrated by their one-dimensional boundary curves. (a) The ratio of squared perimeter and area increases from left to right. (b) The positive curvature (blue) and the negative curvature (red) of the boundary curve. (c) The area is constant for all objects, and the frequency of the sinusoidal grating along the boundary of a circle increases from left to right. The higher the frequency, the larger the contour length, resulting in an decreasing isoperimetric quotient from left to right.
Figure A6
 
(a) The used filter function h. The radius of the inner disk Ωin is 4 and the radius of the whole domain Ω is 7.24. The filter function was determined by a cubic spline interpolation such that its integral over the whole domain is zero. (b) An example binary stimulus. (c) The linear filter output l * h. The integral over the whole domain in (c) is approximately −0.015. (d) A modified sigmoid function F(x) = −2[1/(1 − ex) − 1/2] is applied to the filter output. The area effect of the positive and negative contributions at the corners becomes apparent. This results in an integral over the whole domain in (d) of approximately 131.273.
Figure A6
 
(a) The used filter function h. The radius of the inner disk Ωin is 4 and the radius of the whole domain Ω is 7.24. The filter function was determined by a cubic spline interpolation such that its integral over the whole domain is zero. (b) An example binary stimulus. (c) The linear filter output l * h. The integral over the whole domain in (c) is approximately −0.015. (d) A modified sigmoid function F(x) = −2[1/(1 − ex) − 1/2] is applied to the filter output. The area effect of the positive and negative contributions at the corners becomes apparent. This results in an integral over the whole domain in (d) of approximately 131.273.
Figure A7
 
Two data sets of various stimuli. (a) One rectangular object with increasing cumulative area A. (b) Multiple rectangular objects with constant cumulative area A. All stimuli are binary images of size 100 × 100 pixels; cumulative area A is in pixels.
Figure A7
 
Two data sets of various stimuli. (a) One rectangular object with increasing cumulative area A. (b) Multiple rectangular objects with constant cumulative area A. All stimuli are binary images of size 100 × 100 pixels; cumulative area A is in pixels.
Figure A8
 
Sample images from the data sets: (a) R, (b) CR, (c) C, (d) C1, (e) C2, (f) C3, (g) RA, (h), RA200, (i) RA400.
Figure A8
 
Sample images from the data sets: (a) R, (b) CR, (c) C, (d) C1, (e) C2, (f) C3, (g) RA, (h), RA200, (i) RA400.
Figure A9
 
Mean model output with standard deviation plotted against the true number in the stimuli for the zero-noise case (σin, σlin) = (0, 0) (top, blue) and the noise case (σin, σlin) = (0.4, 2) (bottom, red) for different data sets: (a) TR, (b) R, (c) RC, (d) C.
Figure A9
 
Mean model output with standard deviation plotted against the true number in the stimuli for the zero-noise case (σin, σlin) = (0, 0) (top, blue) and the noise case (σin, σlin) = (0.4, 2) (bottom, red) for different data sets: (a) TR, (b) R, (c) RC, (d) C.
Figure A10
 
Mean model output with standard deviation plotted against the true number in the stimuli for the zero-noise case (σin, σlin) = (0, 0) (top, blue) and the noise case (σin, σlin) = (0.4, 2) (bottom, red) for different data sets: (a) C1, (b) C2, (c) C3, (d) RA, (e) RA200, (f) RA400.
Figure A10
 
Mean model output with standard deviation plotted against the true number in the stimuli for the zero-noise case (σin, σlin) = (0, 0) (top, blue) and the noise case (σin, σlin) = (0.4, 2) (bottom, red) for different data sets: (a) C1, (b) C2, (c) C3, (d) RA, (e) RA200, (f) RA400.
Figure A11
 
The overall discrimination-ability curves of the optimal estimator in the case of (a) the same–different task and (b) the larger–smaller task for different cumulative areas and fixed (σin, σlin) = (0.4, 2), shown on a log scale for data set R. The Weber fractions with increasing cumulative area are (a) (0.135, 0.1438, 0.154, 0.178) and (b) (0.137, 0.133, 0.153, 0.178).
Figure A11
 
The overall discrimination-ability curves of the optimal estimator in the case of (a) the same–different task and (b) the larger–smaller task for different cumulative areas and fixed (σin, σlin) = (0.4, 2), shown on a log scale for data set R. The Weber fractions with increasing cumulative area are (a) (0.135, 0.1438, 0.154, 0.178) and (b) (0.137, 0.133, 0.153, 0.178).
Table 1
 
Parameters of the continuous data-fitting functions for concave data sets. Notes: In all cases the noise levels were σin = 0.4 and σlin = 2. The parameters of the optimal estimators were obtained from data set TR.
Table 1
 
Parameters of the continuous data-fitting functions for concave data sets. Notes: In all cases the noise levels were σin = 0.4 and σlin = 2. The parameters of the optimal estimators were obtained from data set TR.
Table A1
 
PSNR and SNR. Notes: The PSNR and SNR functions and values for the input and linear filter stages are shown for the theoretical setup and the three data sets TR, R, and RC.
Table A1
 
PSNR and SNR. Notes: The PSNR and SNR functions and values for the input and linear filter stages are shown for the theoretical setup and the three data sets TR, R, and RC.
Table A2
 
Model comparison by operations. Notes: The different models (left to right) are split into their basic operations (top to bottom). The operations are split into layers, with a linear operation followed by a nonlinear one (NL). The contrast energy model is by Dakin et al. (2011); the network model is by Stoianov and Zorzi (2012).
Table A2
 
Model comparison by operations. Notes: The different models (left to right) are split into their basic operations (top to bottom). The operations are split into layers, with a linear operation followed by a nonlinear one (NL). The contrast energy model is by Dakin et al. (2011); the network model is by Stoianov and Zorzi (2012).
Table A3
 
Parameters of the continuous data-fitting functions. Notes: In all cases the noise levels were σin = 0.4 and σlin = 2. The parameters of the optimal estimators were obtained from data set TR (see Table A4).
Table A3
 
Parameters of the continuous data-fitting functions. Notes: In all cases the noise levels were σin = 0.4 and σlin = 2. The parameters of the optimal estimators were obtained from data set TR (see Table A4).
Table A4
 
Parameters of optimal estimators. Notes: In all cases the noise levels were σin = 0.4 and σlin = 2.
Table A4
 
Parameters of optimal estimators. Notes: In all cases the noise levels were σin = 0.4 and σlin = 2.
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×