Observers use a wide range of color names, including white, to describe monochromatic flashes with a retinal size comparable to that of a single cone. We model such data as a consequence of information loss arising from trichromatic sampling. The model starts with the simulated responses of the individual L, M, and S cones actually present in the cone mosaic and uses these to reconstruct the L-, M-, and S-cone signals that were present at every image location. We incorporate the optics and the mosaic topography of individual observers, as well as the spatio-chromatic statistics of natural images. We simulated the experiment of H. Hofer, B. Singer, & D. R. Williams (2005) and predicted the color name on each simulated trial from the average chromaticity of the spot reconstructed by our model. Broad features of the data across observers emerged naturally as a consequence of the measured individual variation in the relative numbers of L, M, and S cones. The model's output is also consistent with the appearance of larger spots and of sinusoidal contrast modulations. Finally, the model makes testable predictions for future experiments that study how color naming varies with the fine structure of the retinal mosaic.

Symbol | Meaning | Value |
---|---|---|

N _{pixel} | Linear size of simulated image,
pixels ^{a} | 100–101 |

K _{image} | Size of simulated image,
minutes ^{a} | 11.4–12.6 |

K _{spot} | Simulated spot size, minutes | 0.3 |

K _{coneaperture} | Cone aperture, minutes | 0.9 |

x _{spot}, y _{spot} | Location of a particular simulated spot | |

I _{spot} | Simulated spot intensity | |

λ _{spot} | Simulated spot wavelength, nm ^{b} | 500, 550, 600 |

N _{cones} | Number of cones in image area ^{a} | 128–216 |

N _{neighbors} | Number of cone neighbors in detection model | 4 |

y _{criterion} | Criterion in detection model | 10 |

y_{meanmosaic} | Vector of mean cone responses | |

y_{noisymosaic} | Vector of noisy cone responses | |

y _{pooled} | Pooled cone response in detection model | |

p( )x | Prior over trichromatic images | |

p( ∣ y )x | Likelihood of cone responses, given trichromatic image | |

p( ∣ x )y | Posterior over trichromatic images, given cone responses | |

x_{estimate} | Vector representing reconstructed trichromatic image | |

u _{ x} | Prior mean vector for trichromatic
images x | |

K _{ x} | Prior covariance matrix for
trichromatic images x | |

u _{space} | Prior mean of spatial prior at each location | 1 |

σ _{space} ^{2} | Prior variance of spatial prior at each location | 0.25 |

ρ _{space} | Nearest neighbor correlation of spatial prior | 0.75 |

N _{linmod} | Dimensionality used in approximation of 1-D spatial prior | 20 |

u _{ u′ v′} | Mean of color prior in CIE u′ v′
chromaticity | [0.198, 0.472]′ |

σ _{ u′ v′} ^{2} | CIE u′ v′ variance of color prior | 0.09 |

κ _{color} | Factor relating σ _{factor} ^{2} and μ _{factor} | 4 |

*N*

_{pixel}×

*N*

_{pixel}image representing a small monochromatic spot was created. The image corresponded to

*K*

_{image}×

*K*

_{image}minutes of arc, and the size of the simulated spot was that of the physical stimulus used in the experiment (

*K*

_{spot}minutes in diameter). The spot location

*x*

_{spot},

*y*

_{spot}was chosen at random within the image, subject to the requirement that its center be at least one cone aperture diameter away from the edge of the image. The wavelength

*λ*

_{spot}of the simulated spots matched those used in the experiments. We specified spot intensities (

*I*

_{spot}) in arbitrary units and linked these to the physical spot intensities used in the experiments through a psychophysical model of spot detection (see below).

*K*

_{coneaperture}, in minutes of arc).

*I*

_{spot},

*λ*

_{spot}, and the Smith–Pokorny (DeMarco, Pokorny, & Smith, 1992; Smith & Pokorny, 1975) estimates of cone spectral sensitivities, with the function for each cone normalized to have a quantal sensitivity of one at its wavelength of maximum sensitivity, to compute the mean isomerization rate

*u*

_{mean}of each of the

*N*

_{cones}cones contained within the

*K*

_{image}×

*K*

_{image}minute mosaic. As noted above, intensities in the simulations were specified in arbitrary units, so we simply took the mean isomerization rates directly as the inner product of the spot spectrum and the normalized spectral sensitivities; coupling of intensity to the experiment was achieved through the detection model.

*u*

_{mean}. We refer to the result as the cone responses. In the calculations below, it is convenient to represent the mean cone responses as an

*N*

_{cones}-dimensional column vector

*y*_{meanmosaic}, and the responses with noise added as

*y*_{noisymosaic}.

*N*

_{neighbors}nearest neighbors in the mosaic. If this pooled response

*y*

_{pooled}exceeded a criterion level

*y*

_{criterion}, the flash was classified as seen. This is, in essence, the pooled detection model used by Hofer, Singer, et al. (2005). Supplementary Figures S1 and S2 compare frequency of seeing curves from the detection model to experimental data. The agreement is good when the relation between model intensity and physical intensity is scaled independently for each observer and flash wavelength to optimize the quality of the fit (Supplementary Figure S1). The detection model does not capture the within observer experimental variation in frequency of seeing curves with flash wavelength (Supplementary Figure S2) nor between observer variation. This is not surprising, given that the detection model does not incorporate photoreceptor dark noise or limitations imposed by post-receptoral processing. To predict color naming performance below, we set the relation between model intensity and physical intensity separately for each observer and wavelength.

*y*_{noisymosaic}for seen flashes were used as input to a Bayesian algorithm that reconstructs a trichromatic image. The output of the algorithm is an

*N*

_{pixel}×

*N*

_{pixel}× 3 color image. The three image planes represent the L-, M-, and S-cone components of an idealized retinal image reconstructed by the algorithm. By this, we mean the image that would have been imaged on the retina in the absence of blur by the eye's optics.

*x*_{estimate}with

*N*

_{pixel}×

*N*

_{pixel}× 3 entries. The first

*N*

_{pixel}×

*N*

_{pixel}entries represent the L-cone plane in rasterized order, the second

*N*

_{pixel}×

*N*

_{pixel}entries represent the M-cone plane, and the last

*N*

_{pixel}×

*N*

_{pixel}entries represent the S-cone plane. For purposes of this overview, the algorithm may be thought of as a procedure that takes observations

*y*_{noisymosaic}and produces an image estimate

*x*_{estimate}. The algorithm itself, which forms the core of our model, is described in more detail below.

*u*′

*v*′ chromaticity corresponding to the average LMS triplet.

*y*_{noisymosaic}to trichromatic image estimates

*x*_{estimate}. We adopted a Bayesian algorithm for trichromatic reconstruction (sometimes called demosaicing) that was developed by Brainard (1994) in the context of digital color image processing. The implementation was modified for use with the spatially irregular sampling of the human cone mosaic, and we describe the algorithm below.

*a priori*constraints on the quantity to be estimated.

*p*(

**∣**

*y***). Here**

*x***represents an idealized retinal image as described above and**

*x***represents observed cone responses. The likelihood tells us the probability of any vector of cone responses**

*y***occurring, given that the image is**

*y***.**

*x**p*(

**). This tells us the a priori probability of any image**

*x***occurring.**

*x**p*(

**∣**

*x***) as**

*y**p*(

**∣**

*x***) =**

*y**Cp*(

**∣**

*y***)**

*x**p*(

**), where**

*x**C*is a normalizing constant. The posterior tells us the probability of any image

**given the observed data**

*x***. It is then possible to choose an estimate of**

*y*

*x*_{estimate}of

**from the posterior, for example as the mean or maximum of the posterior. Figure 5 illustrates Bayesian estimation for a very simple example image reconstruction problem.**

*x*

*x*_{estimate}from the posterior.

*f*(

*λ*

_{spot},

*I*

_{spot},

*x*

_{spot},

*y*

_{spot},

*K*

_{spot}) be the function that returns

*y*_{meanmosaic}as a function of the simulated spot properties. The function

*f*() simulates the eye's optics, blurring by photoreceptor sampling, sampling by the trichromatic mosaic, and absorption of light by the L-, M-, and S-cone photopigments. To specify

*f*() for the algorithm, we used estimates of the eye's optics under normal viewing, measurements of the location and classes of the cones in a patch of each observer's retina (measured as described above), and the Smith–Pokorny estimates of cone spectral sensitivities. When viewed as a function that maps images

**to mean observations**

*x*

*y*_{meanmosaic}, this

*f*() is linear in

**.**

*x**f*() into a likelihood, we assumed that the actual cone responses were obtained from the mean cone responses through the addition of zero-mean normally distributed noise. That is, we write

*y*_{noisymosaic}=

*N*(

*y*_{meanmosaic},

**K**

_{noise}), where

**K**

_{noise}is an

*N*

_{cones}×

*N*

_{cones}diagonal matrix with each entry equal to the variance of the noise for the corresponding cone in the mosaic. The variance of the noise for each cone was taken as the mean response of that cone to the prior mean image, which may be thought of as implementing a normal approximation to Poisson photon noise. The reason for using the normal approximation is that it allows for a closed-form solution for the Bayesian estimate.

*p*(

**) as a multivariate normal distribution, so that**

*x**p*(

**) ∼**

*x**N*(

**u**

_{ x},

**K**

_{ x}). Typically, we computed with

*N*

_{pixel}= ∼100, which means that the dimensionality of

**was ∼30,000. This was too large to allow an arbitrary choice of**

*x***K**

_{ x}, so that it was necessary to impose some additional structure on the form of the prior.

**u**

_{ x}=

**u**

_{LMS}⊗

**u**

_{space}and

**K**

_{ x}=

**K**

_{LMS}⊗

**K**

_{space}, where ⊗ represents the Kronecker product. Measurements of natural images indicate that separability is a reasonable but not perfect approximation (Burton & Moorehead, 1987; Párraga, Brelstaff, Troscianko, & Moorehead, 1998; Ruderman, Cronin, & Chiao, 1998; deviations from separability for some image types reported in Párraga, Troscianko, & Tolhurst, 2002).

**u**

_{space}=

**u**

_{space_1}⊗

**u**

_{space_1}and

**K**

_{space}=

**K**

_{space_1}⊗

**K**

_{space_1}, where

**u**

_{space_1}and

**K**

_{space_1}characterize the properties of the vertical/horizontal priors. We took

**u**

_{space_1}to be a constant vector with value u

_{space}, so that the expected value of the prior was constant across image locations. We computed

**K**

_{space_1}in two steps. First, we chose a single variance

*σ*

_{space}

^{2}and nearest neighbor correlation

*ρ*

_{space}and from these generated the covariance matrix of a first-order Markov process along one spatial dimension. The parameter

*ρ*

_{space}was expressed in terms of the correlation between locations separated by the mean cone spacing in the mosaic. Measurements of natural images indicate that they are characterized by a significant nearest neighbor correlation (e.g., Burton & Moorehead, 1987; Field, 1987; Párraga et al., 1998, 2002; Pratt, 1978; Ruderman et al., 1998).

^{1}Denote by

**x**

_{space_1}a random draw from the normal distribution with mean vector

**u**

_{space_1}and covariance matrix

**K**

_{space_1}:

*p*(

**x**

_{space_1}) ∼

*N*(

**u**

_{space_1},

**K**

_{space_1}). In the absence of computational limitations, this is the vertical/horizontal prior we would have used. In practice, however, we needed to reduce the dimensionality of the quantities used in the calculations still further, so that as a second step we approximated this desired spatial prior. We used the singular value decomposition to express

**K**

_{space_1}=

**UDV**′ and took the linear model

_{space_1}to be the first

*N*

_{linmod}columns of

**U**. We then constructed a mean vector

_{space_1}as the vector of weights that provided the best least-squares approximation to

**u**

_{space_1}≈

_{space_1}

_{space_1}, and we constructed a diagonal covariance matrix

_{space_1}whose diagonal elements were the first

*N*

_{linmod}singular values of

**K**

_{space_1}(that is, the first

*N*

_{linmod}diagonal entries of

**D**). We then approximated

*p*(

**x**

_{space_1}) by

*p*(

**w**

_{space_1}) ∼

*N*(

_{space_1},

_{space_1}).

**x**

_{color}at each location were distributed according to

*p*(

**x**

_{color}) ∼

*N*(

**u**

_{color},

**K**

_{color}). We found

**u**

_{color}and

**K**

_{color}by finding the mean vector and the covariance matrix of a sample of 10000 cone coordinate vectors that were generated as follows. First we generated a sample of 10000 CIE

*u*′

*v*′ chromaticity vectors (first entry

*u*′, second entry

*v*′) by drawing from bivariate normal distribution with mean

**u**

_{ u′ v′}and diagonal covariance matrix

**K**

_{ u′ v′}. The matrix

**K**

_{ u′ v′}was diagonal, and both of its entries were the same and given by parameter

*σ*

_{ u′ v′}

^{2}. We took

**u**

_{ u′ v′}to be the chromaticity of CIE standard illuminant D65. We converted each of the 10000 chromaticity vectors to a normalized cone coordinate vector, with the magnitude of these vectors chosen so that the mean of the L- and M-cone coordinates was unity. We then scaled each of the normalized cone coordinate vectors by an intensity factor. The intensity factor used for each normalized cone coordinate vector was obtained by an independent draw from a univariate normal distribution,

*p*(

*f*

_{color}) ∼

*N*(

*μ*

_{factor},

*σ*

_{factor}

^{2}). We set

*μ*

_{factor}so that the mean of the L- and M-cone coordinates of the set of 10000 scaled cone coordinate vectors was equal to the mean L- and M-cone coordinate of the ensemble of stimuli being simulated. For the flashed spot simulations, we used a single value for

*μ*

_{factor}for all observers and wavelengths. This factor was chosen to make the prior mean intensity equal to the mean intensity (over observers and wavelengths) required for 50% frequency of seeing. We set

*σ*

_{factor}

^{2}=

*κ*

_{color}

*μ*

_{factor}, where

*κ*

_{color}was a specified constant. The resulting set of 10000 scaled cone coordinate vectors was the set used to determine

**u**

_{color}and

**K**

_{color}.

*p*(

**) is that the properties of this prior were controlled by a relatively small number of parameters. Although the dimensionality of**

*x***was typically ∼30,000, the prior was specified by parameters u**

*x*_{space},

*σ*

_{space}

^{2},

*ρ*

_{space},

*N*

_{linmod},

**u**

_{ u′ v′},

*σ*

_{ u′ v′}

^{2},

*μ*

_{factor}, and

*κ*

_{color}. We did not systematically vary u

_{space},

*N*

_{linmod},

**u**

_{ u′ v′}, or

*μ*

_{factor}, so that in effect the prior was controlled by four parameters. Permitting only a small number of degrees of freedom for the prior is crucial to managing the complexity of computational studies such as the one reported here.

_{noisymosaic}using standard formulae (Brainard, 1994; Gelman et al., 2004; see also Pratt, 1978, development of discrete Wiener estimator). This posterior mean is also the value of that maximizes the posterior, and we take it as our estimate

_{estimate}.

*x*_{estimate}=

**I**

*y*_{noisymosaic}+

*i*_{0}, where

**I**is an (

*N*

_{pixel}×

*N*

_{pixel}× 3) ×

*N*

_{cones}matrix and

*i*_{0}is an (

*N*

_{pixel}×

*N*

_{pixel}× 3) vector. Figure 6 provides an intuitive interpretation of the form of this estimator.

*σ*

_{ u′ v′}

^{2}= [0.07 0.08 0.09 0.10],

*σ*

_{space}

^{2}= [0.25 0.5 1.0],

*ρ*

_{space}= [0.75 0.85 0.95], and

*κ*

_{color}= [2 4 8]. For each combination, we simulated 2000 flashed spots at each wavelength for each observer and extracted the chromaticities of the reconstructed seen spots. We then fit the color boundaries to maximize the agreement between predicted and measured naming percentages. Since the simulation involves a random component (spot locations and added noise on each simulated trial), we were cautious about the possibility that some of the boundary fitting might be explaining random rather than systematic variation. To avoid this, we resimulated the experiment for each of the 108 prior choices and computed the correlation between predicted and named percentages using the corresponding boundary that was fit on the first run. We selected the prior parameters that led to the highest second run correlation, without refit of the naming boundaries.

*σ*

_{ u′ v′}

^{2}= 0.09,

*σ*

_{space}

^{2}= 0.25,

*ρ*

_{space}= 0.75, and

*κ*

_{color}= 4. The computed correlations between cone classes corresponding to these parameters were LM: 0.94, LS: 0.65, and MS: 0.70. Large correlations between cone classes are typical in natural images (Burton & Moorehead, 1987; Ruderman et al., 1998; see also Nascimento, Ferreira, & Foster, 2002; also Jaaskelainen, Parkkinen, & Toyooka, 1990; Maloney, 1986). Figure S3 shows three draws from the prior. The images appear as blurry desaturated colored noise, consistent with the fact that prior incorporates spatial correlations and correlations across cone classes, but not any model of the higher-order spatial structure that occurs in natural images (Simoncelli, 2005).

*a priori*than a red/green grating. In addition, the high correlation between neighboring pixels in the prior makes low spatial frequency stimuli more likely

*a priori*than high spatial frequency stimuli. As the spatial frequency of the stimulus gratings increases, the reconstructed power falls off in part because of optical blurring. But this blurring affects isochromatic and red/green isoluminant gratings to about the same degree. The differential effect seen in the top panel of the figure occurs because the prior, which favors low frequency interpretations in the face of ambiguity introduced by trichromatic sampling, begins to dominate earlier for red/green gratings than for isochromatic gratings. For observer BS (bottom panel), who has very few M cones, the spatial falloff for red-green gratings is comparable to that of the S-cone isolating gratings. The difference in predicted performance for the two observers suggests that carefully tailored experiments with gratings might be able to reveal individual differences in L:M cone ratio.

*x*-axis shows the number of L cones among the 10 cones nearest to the location of the flashed spot; the

*y*-axis shows the corresponding predicted percentage of flashes named white. Each individual line in the plot was derived from the data for a different observer and spot wavelength. The individual lines do not span the entire

*x*-axis because there were not enough flashes in every bin to compute a meaningful prediction; we only plotted data for bins that contained more than 10 seen flashes (from simulations of 2000 flashes).

*x*_{estimate}=

**I**

*y*_{noisymosaic}+

*i*_{0}. As emphasized by Brainard et al. (2006), in general the neural implementation of a Bayesian calculation need not resemble its implementation on a digital computer. Here, however, examination of the form of the estimator allows straightforward interpretation of the algorithm in neural terms. If we neglect the

_{0}term, we see that the reconstructed image is the weighted sum of a set of basis images, one from each column of the matrix

**I**, with the weight on each column simply being the magnitude of the response of the corresponding cone. We know from examination of these basis images that each represents a blurred colored blob, as in the lower panels of Figures 7 and 8. Thus, we can conceive of a set of neurons, one for each foveal cone, each of which represents a single basis image. A set of second stage neurons could then pool the output of the first to produce a representation of the Bayesian image reconstruction.

*N*

_{neighbors}= 4 and

*y*

_{criterion}= 10. The quality of the predictions was not highly sensitive to variation of these parameters around the chosen values. For each observer and wavelength, a scale factor linking model flash intensity (arbitrary) units to physical flash intensity was chosen to optimize the agreement between predictions and data. Blue: 500 nm; green: 550 nm; red: 600 nm. Observer identity is indicated in the inset of each panel.

*N*

_{neighbors}= 4 and

*y*

_{criterion}= 10. The quality of the predictions was not highly sensitive to variation of these parameters around the chosen values. For each observer, a single scale factor linking model flash intensity (arbitrary) units to physical flash intensity was chosen to optimize the agreement between predictions and data simultaneously for all three flash wavelengths. Blue: 500 nm; green: 550 nm; red: 600 nm. Observer identity is indicated in the inset of each panel.