Information in the cortex is thought to be represented by the joint activity of neurons. Here we describe how fundamental questions about neural representation can be cast in terms of the topological structure of population activity. A new method, based on the concept of persistent homology, is introduced and applied to the study of population activity in primary visual cortex (V1). We found that the topological structure of activity patterns when the cortex is spontaneously active is similar to those evoked by natural image stimulation and consistent with the topology of a two sphere. We discuss how this structure could emerge from the functional organization of orientation and spatial frequency maps and their mutual relationship. Our findings extend prior results on the relationship between spontaneous and evoked activity in V1 and illustrates how computational topology can help tackle elementary questions about the representation of information in the nervous system.

*X*and

*Y*exists, we say they are topologically equivalent and write

*X*∼

*Y*; otherwise, we write

*X*≁

*Y.*This notion of equivalence is illustrated in Figure 1, where the reader is invited to mentally visualize the possible transformations between the various objects to verify the stated equivalence relationships.

*X*can be arranged in a sequence,

*b*(

*X*) = (

*b*

_{0},

*b*

_{1},

*b*

_{2}, …), where

*b*

_{0}represents the number of connected components,

*b*

_{1}represents the number of one-dimensional holes,

*b*

_{2}the number of two-dimensional holes, and so forth. An important property of Betti sequences is that if two objects are topologically equivalent (they can be deformed into each other) they share the same Betti sequence. One must note, as we will shortly illustrate, that the reverse is not always true: two objects can be different but have the same Betti sequence.

*d*(

*x, y*) between any two points.

*ɛ,*and we proceed to connect all points for which

*d*(

*x, y*) <

*ɛ*with edges, all triplets for which all pairwise distances are smaller than

*ɛ*with triangles, all quadruplets for which all pairwise distances are smaller than

*ɛ*with tetrahedra, and so on. The Betti numbers are then computed based on the Rips complexes at different values of

*ɛ*. The parameter

*ɛ*effectively controls the “spatial scale” of analysis.

*ɛ*goes from zero to infinity. For each Betti number, we keep a separate graph. Connected components are drawn as horizontal lines in the

*b*

_{0}graph, one-dimensional holes correspond to horizontal lines in the

*b*

_{1}graph, two-dimensional holes in the

*b*

_{2}graph, and so on. For each hole, the horizontal line has its endpoints at the values of

*ɛ*at which the structure was first created and then destroyed. The set of all these lines together is called a barcode.

*ɛ*and the corresponding barcode is shown in Figure 3. The data are randomly sampled points from a torus. In each panel, the left three graphs show the barcode obtained from this calculation. The graph on the top corresponds to

*b*

_{0}, meaning that each horizontal line represents a different connected component; the middle graph corresponds to

*b*

_{1}, where each horizontal line corresponds to a one-dimensional loop; the bottom graph corresponds to

*b*

_{2}, where horizontal lines represent two-dimensional holes. The illustrations to the right of the barcode show the state of the Rips complex for the selected value of

*ɛ,*which is indicated by the red vertical bars in the barcodes. The value of

*ɛ*increases as we move from Figure 3a to Figure 3d. The blue edges in the Rips complex link the points for which

*d*(

*x, y*) <

*ɛ*. The panels to the right show red triangles defined by triplets of points for which the pairwise distances satisfy

*d*(

*x, y*) <

*ɛ*. The blue triangles show triangles that will later be added to the Rips complex at

*higher*values of

*ɛ*but should be considered “holes” for the level at which they are being shown. For simplicity, we do not show higher-order building blocks, such as tetrahedra.

*ɛ,*only one edge exists and the resulting structure has many different connected components and no holes of any dimension ( Figure 3a, top panels). The Betti sequence for this value of

*ɛ*can be recovered by counting how many horizontal lines (corresponding to different holes) the red vertical line crosses in each of the graphs. The resulting sequence is (

*b*

_{0},

*b*

_{1},

*b*

_{2}) = (50, 0, 0). At a higher value of

*ɛ,*we see more edges being added (thereby reducing the number of connected components) but still no holes of any dimensions ( Figure 3b). The corresponding sequence is (

*b*

_{0},

*b*

_{1},

*b*

_{2}) = (38, 0, 0). At the next higher value of

*ɛ,*we finally obtain a single connected component ( Figure 3c). The vertical bar in the graph corresponding to

*b*

_{1}crosses two horizontal lines, meaning that there are two one-dimensional holes. However, the red line does not intersect any horizontal lines in the graph for

*b*

_{2}meaning that there are no two-dimensional holes at this scale (this is because some of the triangles still need to be filled-in). The Betti sequence for this value of

*ɛ*is then (

*b*

_{0},

*b*

_{1},

*b*

_{2}) = (1, 2, 0). Finally, at a slightly higher value of

*ɛ,*the correct signature of the torus emerges (

*b*

_{0},

*b*

_{1},

*b*

_{2}) = (1, 2, 1) ( Figure 3d). This Betti sequence then persists for a long interval of

*ɛ*.

*b*

_{0},

*b*

_{1},

*b*

_{2}, …) that is stable over a “long” interval of spatial scales (

*ɛ*

_{low},

*ɛ*

_{high}) is a good candidate for our estimate of the topological structure of the data set. In this case, one would propose (

*b*

_{0},

*b*

_{1},

*b*

_{2}) = (1, 2, 1) as our guess for the underlying topology of the space. We will see how a statistical method can be developed to estimate the probability that such signature could have arisen by chance.

*ɛ,*which moves from left to right, as the spatial scale goes from fine to coarse over the animation sequence. The middle structure illustrates the data in 3D space (rotating over time for visualization purposes) and the edges (or 1-simplices) being formed as

*ɛ*increases. The graph on the right shows the triangulation (or 2-simplices) being created as

*ɛ*increases. In all three examples, it can be seen how at an appropriate scale of analysis, the constructions yield the correct Betti numbers for these objects.

*i*th cell is given by:

*i*= 0, …,

*N*− 1. Here,

*N*represents the total number of cells in the population,

*κ*determines the bandwidth of tuning, and

*r*

_{max}the maximum spike rate. These rate functions were used to generate spike counts that were Poisson distributed. We selected a value

*κ*= 2 to match the average tuning as observed experimentally (Ringach, Shapley, & Hawken, 2002) (Figure 5a).

*r*

_{max},

*N*). In all cases, we simulated 100 presentations of 18 orientations equally spaced around the circle. Thus, in all conditions, our data set (or point cloud) consisted of 1800 points ( Figure 5b). Given this simulated data, we applied our algorithm to calculate the maximum interval of the parameter

*ɛ*for which we observe the signature of a circle: (

*b*

_{0},

*b*

_{1},

*b*

_{2}) = (1, 1, 0) ( Figure 5c). To estimate the likelihood that the result could have resulted by chance, we shuffled the elements of the data matrix in Figure 5b and re-computed the maximal length a total of 100 times. The probability that the measured length was obtained by chance was assessed from this empirical distribution ( Figure 5d).

*p*(

*r*

_{max},

*N*), which is plotted in the pseudo-color image of Figure 6a. The dashed line shows the approximate boundary for detecting a circle at a significance level of

*p*< 0.05. This analysis makes it evident that there is a trade-off between the number of cells and mean spike counts per time bin that is necessary to detect the circle at a fixed significance level. The larger the number of neurons, the smaller the spike rates can be and still allow for the reliable estimation of the underlying topology. For five cells, for example, one would need an average of ∼4.5 spikes per time bin; for 10 cells, on the other hand, the rates can be as low as 1.5 spikes per time bin. This dependence indicates that the total number of spikes collected is a key variable controlling the statistical power of the technique.

*θ, ϕ*) given by

*N*. For any given population, the centers of the tuning curves, (

*θ*

_{ i},

*ϕ*

_{ i}), were chosen randomly inside the rectangle [0,

*π*] × [0, 2

*π*], such that the tuning curve of the

*i*th cell was given

*i*= 1, …,

*N*. For this simulation, we used values of

*κ*

_{ θ}= 2 and

*κ*

_{ ϕ}= 1.5. Here,

*r*

_{max}represents the mean number of spike counts in a time bin. In all cases, we simulated 100 presentations of all 400 stimuli (

*πk*/20,

*πl*/20), where

*k, l*= 0, …, 19. Thus, in all situations there are a total of 40,000 points in the data set. We then calculated the maximal length of the signature (

*b*

_{0},

*b*

_{1},

*b*

_{2}) = (1, 2, 1). To evaluate the statistical significance of the result, we computed the same statistic for 50 random permutations of all the elements within the data matrix.

*r*

_{max},

*N*) we computed

*p*(

*r*

_{max},

*N*), which is shown in the pseudo-color image of Figure 6b. The dashed line shows the approximate boundary for detecting a torus at a significance level of 0.05. As for the case of the circle, the larger the number of neurons, the smaller the spike rates can be and still allow for the reliable estimation of the underlying topology. For five cells, for example, the method would require an average of ∼4.5 spikes per time bin, while for 15 cells, on the other hand, the rates can be as low as 1.5 spikes per time bin.

*r*

_{max}, 1.5 ×

*r*

_{max}) ( Figure 7, top). We then used a negative binomial distribution to generate counts with a variance/mean ratio of 1.9 (Kang, Shapley, & Sompolinsky, 2004) but with a uniform rate across the population (Figure 7, middle). Finally, we ran a simulation with non-Poisson firing and non-uniform firing rates (Figure 7, bottom). All the other parameters of the simulation were exactly the same as those in the simulations of Figure 6a. We can see that these manipulations have little effect on our ability to detect the structure of the object, demonstrating the robustness of the technique.

*Guidelines for the Care and Use of Mammals in Neuroscience*. Experiments were performed on three old-world monkeys (

*Macaca fascicularis,*3.2–4.5 kg). Initially, animals were sedated with acepromazine (30–60

*μ*g/kg) and anesthetized with ketamine (5–20 mg/kg, im). Initial surgery was then performed under 1.5–2.5% isoflurane. Two intravenous lines were put in place for the continuous infusion of drugs. A urethral catheter was inserted to collect and monitor urine output. An endotracheal tube was inserted to allow for artificial respiration. Pupils were dilated with ophthalmic atropine, and the eyes protected with Tobradex (Alcon Laboratories, Texas) and custom-made gas permeable contact lenses.

*μ*g/kg/h) and propofol (2–6 mg/kg/h). After monitoring the anesthetic plane for about 10–20 minutes, we proceeded to perform a craniotomy over primary visual cortex. Only after the completion of all surgical procedures, including the insertion of the electrode array, the animal was paralyzed (Pavulon, 0.1 mg/kg/h).

_{2}, SpO

_{2}, and EEG were continually monitored by an HP Virida 24C neonatal monitor. Urine output and specific gravity were measured every 4–5 h to ensure adequate hydration. Drugs were administered in balanced physiological solution at a rate to maintain a fluid volume of 5–10 ml/kg/h. Rectal temperature was maintained by a self-regulating heating pad at 37.5°C. Expired CO

_{2}was maintained between 4.5 and 5.5% by adjusting the stroke volume and ventilation rate. The maximal pressure developed during the respiration cycle was monitored to verify that there was no incremental blocking of the airway. A broad-spectrum antibiotic (bicillin, 50,000 IU/kg) and anti-inflammatory steroid (dexamethasone, 0.5 mg/kg) were given at the beginning of the experiment and every other day.

*μ*m. The receptive fields of neurons from the arrays overlapped significantly (only those at opposite ends of the array were non-overlapping). Thus, our recordings come from populations whose receptive fields are responding to the same area of visual space.

_{2}and displayed on monitor at a refresh rate of 100 Hz and a typical screen distance of 80 cm. The mean luminance was 56 cd/m

^{2}. A Photo Research Model 703-PC spectro-radiometer was used for calibration. The eyes were initially refracted by direct ophthalmoscopy to bring the retinal image into focus for a stimulus roughly 80 cm from the eyes. Once neural responses were isolated, we measured spatial frequency tuning curves and maximized the response at high spatial frequencies by changing external lenses in steps of 0.25 D. This procedure was performed independently for both eyes.

_{2}computer played back the images during the experiment on a computer screen that measured 34.3 cm wide by 27.4 cm high. The refresh rate of the monitor was 90 Hz and each movie image was presented for six consecutive frames. The mean luminance of the display was 56 cd/m

^{2}. Stimulation was monocular to the dominant eye (the other eye was occluded). The images subtended 6 deg × 4.5 deg of visual angle and covered all the receptive fields under measurement.

^{5}. The software package PLEX was used with a weak witness complex construction. PLEX is a Matlab collection of functions for computational topology and is available in http://math.stanford.edu/comptop/programs/. We used a weak witness construction with 35 landmarks points which were selected using the “max–min” procedure (see 1). The “max–min” procedure was seeded with each one of the 200 points in the data set in order to eliminate dependence on our initial selection. We recorded the maximal length of persistence intervals for

*b*

_{1}and

*b*

_{2}for each of the 200 seeds.

*p*< 10

^{−10}).

*b*

_{0},

*b*

_{1},

*b*

_{2}). On top of each triplet of Betti numbers an object consistent with each signature is shown. The distributions of topological signatures for both experimental conditions are shown in the histograms of Figure 9b, where the

*x*-axis represents the same ordering of signatures as depicted in Figure 9a. The left column represents distributions for the spontaneous condition, while the right column represents the distributions for the natural image stimulation condition. Each row represents a different “threshold” for the length of the interval of the signature (in the barcode) as a fraction of the covering radius of the data. Larger thresholds represent instances where the signature was “long-lived” and likely to represent a salient feature of the data. Nevertheless, all the topological features shown are statistically different from noise, as Monte Carlo simulations using shuffled data show that the probability of obtaining segments of

*b*

_{1}or

*b*

_{2}longer than 0.3 by chance (which is the smallest threshold used) was less than 0.005.

*both*the structure of the actual image sequences (Carlsson et al., 2008) and the structure of the cortical responses and check if there is any correlation between the topological signatures across independent trials.

*uncorrelated*with the orientation maps. If the proposed spherical model is correct, and the cortical state wandered out of the equator near the poles, one may expect the activity to be uncorrelated with the orientation states lying near the equator. An experimental prediction from this model is that if one were to measure orientation

*and*spatial frequency maps, when activity is uncorrelated with the orientation maps it must be correlated with the spatial frequency maps (and vice versa). This is readily testable using the methods of Kenet et al. (2003).

_{2}(binary, 0 or 1) coefficients. This is the reason the Betti sequences for the torus and the Klein bottle are the same in our calculations (the two objects can be differentiated if the homology is computed over a different field). Given a set of points

*V,*a

*k*-simplex is an unordered subset {

*v*

_{0},

*v*

_{1}, …,

*v*

_{k}} where

*v*

_{i}∈

*V*and

*v*

_{i}≠

*v*

_{j}for all

*i*≠

*j*. The faces of this

*k*-simplex consist of all (

*k*− 1)-simplices of the form {

*v*

_{0}, …,

*v*

_{i−1},

*v*

_{i+1}, …,

*v*

_{k}} for some 0 ≤

*i*≤

*k*.

*k*-simplex can be described as follows: given

*k*+ 1 points in R

^{ m}(

*m*≥

*k*), the

*k*-simplex is a convex body bounded by the union of (

*k*+ 1) linear subspaces of R

^{ m}defined by all possible collections of

*k*points (chosen out of

*k*+ 1 points). A simplicial complex is a collection of simplices which is closed with respect to inclusion of faces. Triangulated surfaces form a concrete example, where the vertices of the triangulation correspond to

*V*. The orderings of the vertices correspond to an orientation. Any abstract simplicial complex on a (finite) set of points

*V*has a geometric realization in some R

^{ n}. Let

*X*denote a simplicial complex. Roughly speaking, the homology of

*X,*denoted

*H*

_{*}(

*X*), is a sequence of vector spaces {

*H*

_{ k}(

*X*):

*k*= 0, 1, 2, 3…}, where

*H*

_{ k}(

*X*) is called the

*k*-dimensional homology of

*X*. The dimension of

*H*

_{ k}(

*X*) is the

*k*th Betti number of

*X, b*

_{ k}(

*X*), which is a measurement of the number of different holes in the space

*X*that can be sensed by using sub-complexes of dimension

*k*.

*H*

_{0}(

*X*) is equal to the number of connected components of

*X*. These are the types of features that can be detected by using points and edges. With this construction one is answering the question: are two points connected by a sequence of edges or not? The simplest basis for

*H*

_{0}(

*X*) consists of a choice of vertices in

*X,*one in each path-component of

*X*. Likewise, the simplest basis for

*H*

_{1}(

*X*) consists of loops in

*X,*each of which surrounds a hole in

*X*. For example, if

*X*is a graph, then the space

*H*

_{1}(

*X*) encodes the number and types of cycles in the graph, this space has the structure of a vector space.

*X*denote a simplicial complex. Define for each

*k*≥ 0, the vector space

*C*

_{ k}(

*X*) to be the vector space whose basis is the set of oriented

*k*-simplices of

*X*; that is, a

*k*-simplex {

*v*

_{0},

*v*

_{1}, …,

*v*

_{ k}} together with an order type denoted [

*v*

_{0},

*v*

_{1}, …,

*v*

_{ k}] where a change in orientation corresponds to a change in the sign of the coefficient: [

*v*

_{0}, …,

*v*

_{ i}, …,

*v*

_{ j}, …,

*v*

_{ k}] = −[

*v*

_{0}, …,

*v*

_{ j}, …,

*v*

_{ i}, …,

*v*

_{ k}] if an odd permutation is used.

*k*larger than the dimension of

*X,*we set

*C*

_{ k}(

*X*) = 0. The boundary map is defined to be the linear transformation ∂:

*C*

_{ k}(

*X*) →

*C*

_{ k−1}(

*X*),

*k*≥ 1, producing the sequence

*C*

_{ k}: the cycles (those sub-complexes without boundary) and the boundaries (those sub-complexes which are themselves boundaries) formally defined as:

_{°}∂ = 0; that is, the boundary of a chain has empty boundary. It follows that

*B*

_{ k}is a subspace of

*Z*

_{ k}. This has great implications. The

*k*-cycles in

*X*are the basic objects which count the presence of a “hole of dimension

*k*” in

*X*. But, certainly, many of the

*k*-cycles in

*X*are measuring the same hole; still other cycles do not really detect a hole at all—they bound a sub-complex of dimension

*k*+ 1 in

*X*. We say that two cycles

*ξ*and

*η*in

*Z*

_{ k}are

*homologous*if their difference is a boundary:

*H*

_{ k}(

*X*) is an equivalence class of homologous

*k*-cycles. This inherits the structure of a vector space in the natural way [

*ξ*] + [

*η*] = [

*ξ*+

*η*] and

*c*[

*ξ*] = [

*cξ*] for

*c*∈ Z

_{2}.

*f*:

*X*→

*Y*is a

*homotopy equivalence*if there is a map

*g*:

*Y*→

*X*so that

*f*

_{°}

*g*is homotopic to the identity map on

*Y*and

*g*

_{°}

*f*is homotopic to the identity map on

*X*. This notion is a weakening of the notion of homeomorphism, which requires the existence of a continuous map

*g*so that

*f*

_{°}

*g*and

*g*

_{°}

*f*are equal to the corresponding identity maps. The less restrictive notion of homotopy equivalence is useful in understanding relationships between complicated spaces and spaces with simple descriptions. We say that two spaces

*X*and

*Y*are

*homotopy equivalent*or that have the same

*homotopy type*if there is a homotopy equivalence from

*X*to

*Y*. This is denoted by

*X*∼

*Y*.

*H*

_{*}(

*X*) is a topological invariant of

*X*: it is indeed an invariant of homotopy type. Readers familiar with the Euler characteristic of a triangulated surface will not find it odd that intelligent counting of simplicies yields an invariant. For a simple example, the reader is encouraged to contemplate the “physical” meaning of

*H*

_{1}(

*X*). Elements of

*H*

_{1}(

*X*) are equivalence classes of (finite collections of) oriented cycles in the 1-skeleton of

*X,*the equivalence relation being determined by the 2-skeleton of

*X*.

*X*and

*X*′. Let

*f*:

*X*→

*X*′ be a continuous simplicial map:

*f*takes each

*k*-simplex of

*X*to a

*k*′-simplex of

*X*′ where

*k*′ ≤

*k*. Then, the map

*f*induces a linear transformation

*f*

_{#}:

*C*

_{ k}(

*X*) →

*C*

_{ k}(

*X*′). It is a simple lemma to show that

*f*

_{#}takes cycles to cycles and boundaries to boundaries; hence, there is a well-defined linear transformation on the quotient spaces

*induced homomorphism*of

*f*on

*H**. Functoriality means that (1) if

*f*:

*X*→

*Y*is continuous then

*f*:

*H*

_{ k}(

*X*) →

*H*

_{ k}(

*Y*) is a group homomorphism; and (2) the composition of two maps

*g*

_{°}

*f*induces the composition of the linear transformations: (

*g*

_{°}

*f*)* = (

*g**

_{°}

*f**).

*d,*together with a parameter

*ɛ,*and construct from it some simplicial complex, for example the Rips complex, denoted

*R*

_{ ɛ}(X). This complex will have X as its vertex set, and a collection {

*x*

_{0},

*x*

_{1}, …,

*x*

_{ k}} ⊂ X will span a

*k*-simplex in

*R*

_{ ɛ}(X) if and only if

*d*(

*x*

_{ i},

*x*

_{ j}) ≤

*ɛ*for all 0 ≤

*i, j*≤

*k,*where

*d*denotes the metric (distance) which is chosen depending on the problem at hand.

*R*

_{ ɛ}(X) may be computationally intractable for very large data sets, as it requires the calculation of all pairwise distances between points in the set. Another possible construction is the

*Witness complex*. Given a finite set of points X equipped with a distance function

*d,*a set of points

*L*⊂ X, the

*landmark set,*and

*ɛ*≥ 0, we say that a point

*x*∈ X is an

*ɛ*-witness for a

*k*+ 1-tuple {

*l*

_{0},

*l*

_{1}, …,

*l*

_{ k}} of points in

*L*if max

_{ i}

*d*(

*x, l*

_{ i}) ≤

*ɛ*+

*m*

_{ x}, where

*m*

_{ x}denotes the

*k*+ 1 smallest value of

*d*(

*x, l*) as

*l*varies over all of

*L*.

*W*

_{ ɛ}(X,

*L*) to X,

*L*and

*ɛ,*by letting the vertex set of

*W*

_{ ɛ}(X,

*L*) be

*L*and declaring that a collection {

*l*

_{0},

*l*

_{1}, …,

*l*

_{ k}} spans a

*k*-simplex in

*W*

_{ ɛ}(X,

*L*) if and only if there is an

*ɛ*-witness for the collection {

*l*

_{0},

*l*

_{1}, …,

*l*

_{ k}} and for all its faces.

*ɛ*≤

*ɛ*′, there is an evident inclusion

*W*

_{ ɛ}(X,

*L*) ⊂

*W*

_{ ɛ′}(X,

*L*). Consequently, we have an increasing family of simplicial complexes, parameterized by the real line, just as we did for the Rips complexes. In practice, the landmark set is built either by uniform random sampling over

*X*or by the max–min procedure: one first randomly picks a point

*l*

_{1}from X. Then, the second point

*l*

_{2}is chosen so as to maximize

*d*(

*l*

_{1},

*l*

_{2}). Subsequently, points are chosen to maximize the distance to the set of points already chosen. Earlier work has shown that this much smaller complex accurately represents topology in simple cases, and we regard it as a computationally tractable proxy for the Rips complex (Carlsson & DeSilva, 2004).

*ɛ*establishes the “spatial scale” of analysis. Assume that X was sampled from an underlying space

*X*. When

*ɛ*is very small, the result will be a discrete set of points; when

*ɛ*is large, the result will be a single simplex of dimension #X − 1. However, there is typically a middle range of values of

*ɛ*where

*R*

_{ ɛ}(X) has homology isomorphic to that of the original space and therefore has Betti numbers equal to those of

*X.*Thus, one of the key concepts below is that the analysis will have to be done for a range of values, from low to high, and investigate those scales where the topological structure remains invariant.

*X*is a Riemannian manifold, for example, one can explicitly estimate a range of values of for which this is the case (Niyogi, Smale, & Weinberger, 2006). In our situation, we only have the finite sample and no a priori information about the underlying space; therefore, obtaining such estimates is not practical. Edelsbrunner and colleagues (2000), however, made the following observation. Given

*ɛ*≤

*ɛ*′ there is a natural inclusion of simplicial complexes

*R*

_{ɛ}(X) ⊂

*R*

_{ɛ′}(X), and because of the functoriality property discussed above, one obtains a linear transformation

*H*

_{k}(

*R*

_{ɛ}(X)) →

*H*

_{k}(

*R*

_{ɛ′}(X)) for any

*k*. What Edelsbrunner et al. discovered was that in order to study the homology of a given space using a point cloud sampled from it, one should keep track of the entire system of vector spaces

*H*

_{k}(

*R*

_{ɛ}(X)) along with all the linear transformations described above. Such a system is called a persistence vector space. Importantly, it was shown that persistence vector spaces admit a classification analogous to the classification result for finite dimensional vector spaces (Zomorodian & Carlsson, 2004), which asserts that two vector spaces of the same dimension are isomorphic. In the case of persistence vector spaces, it turns out that attached to each persistence vector space, there is an invariant called a

*barcode*which is just a finite collection of intervals (perhaps infinite to the right), and that any two persistence vector spaces with the same barcodes are isomorphic. With computational efficiency considerations in mind, one could opt to compute barcodes using the Witness complex construction.

*ɛ*is permitted to go to infinity. This is because for sufficiently large

*ɛ*we will construct the full complex with the given number of landmark points. If the set of landmarks is large, this may become intractable as well. For this reason, we introduce a number

*R*

_{0}associated with a choice of landmark points

*L,*which is the covering radius of the set

*L,*defined by

*R*

_{0}: = max

_{ x∈X}min

_{ l∈ L}

*d*(

*x, l*). In practice, we use this as an upper bound for the persistence parameter and express lengths of persistence intervals as fractions of

*R*

_{0}. When we have data which are the result of independent repeats of the same experiment, we explore the resulting topological objects obtained by plotting the relative frequency of observation for different topological signatures (sequences of Betti numbers) for different lengths of the persistence interval (which we referred to as the “threshold” in the body of the article).