There is an important distinction between the human perception of colors and the use of wavelength to guide eye growth. Because perceptual color vision is based on the relative activations of red, green, and blue cones, with just three monochromatic light sources one can cover approximately 80% of the human color gamut (
Song, Li, & Liu, 2018). In practice, finite spectral bandwidth limits the color gamut of contemporary computer monitors and flatscreen televisions to less than this, but nonetheless one can generate such a broad range of perceptual colors with mixtures of red, green, and blue that much of our video technology is based on this. However, because the focal plane of light varies continuously with wavelength, for studies of emmetropization we must consider the entire visible spectrum (
Autrusseau, Thibos, & Shevell, 2011). Therefore, in the present study, we used a set of 25
hyperspectral images that were obtained from a publicly available database (
Chakrabarty & Zickler, 2011). Hyperspectral images differ from the images of a standard digital camera in that, rather than each pixel containing only three spectral values (red, green, and blue), each pixel contains intensity values at multiple wavelengths spanning the entire visible spectrum (
Lu & Fei, 2014). An example of one of the hyperspectral images that we used is given in
Figure 1. For each hyperspectral image, we calculated the effective images as sensed separately by the SWS and MLWS cone arrays as a function of optical defocus.
The present model extends to humans and to full-spectrum (hyperspectral) images a model of emmetropization that was developed in dichromatic tree shrews using monochromatic images (
Gawne & Norton, 2020). This model assumes that emmetropization is guided by differences in the focus of shorter wavelengths, detected by the array of SWS cones, and the focus of longer wavelengths by the array of pooled MWS and LWS cones (MLWS). Greater image clarity detected by the SWS cone array, relative to the MLWS cone array, produces a drive for the eye to elongate. Greater relative image clarity as detected by the MLWS cone array produces a drive to slow eye growth. For each of 25 hyperspectral images, we calculated, for several levels of myopic (minus) and hyperopic (plus) defocus, the pattern of activation of the arrays of S and ML cone photoreceptors. More specifically, for each image and level of defocus, we derived a separate functional image.
We assumed that the optical power of the human eye is 60 D (
Keating, 2002), which results in a focal length of 16.667 mm (1000 mm × 1/60). A power of 59 D would result in a focal length of 16.949 mm, so 1 D of defocus should correspond to an axial distance of approximately 282 µm. One degree of visual angle (≈ 1/57 radian) times the focal length translates to a distance across the retina of about 291 µm. We used a set of publicly available hyperspectral images (
http://vision.seas.harvard.edu/hyperspec/) (
Chakrabarty & Zickler, 2011). The total size of the download is large, at around 8.7 gigabytes, and the authors ask that readers wanting copies of these images email them for the download link (ayanc[at]eecs[dot]harvard[dot]edu).
Each image spans 1392 pixels horizontally and 1040 pixels vertically. Each individual pixel contains spectral data from 31 different wavelength bands, from 420 nm to 720 nm in 10-nm increments. We assume that each spatial pixel in the image maps is 4 µm across the surface of the retina; therefore, an image 1392 pixels wide would span about 5568 µm across the retinal surface and correspond to about 19.1 degrees of visual angle. Because real-world images tend to have similar statistics at different scales (
Field, 1987;
Ruderman, 1994) and because any given image can be arbitrarily made to occupy a smaller or larger field of view just by moving closer or farther away, we made no attempt to match the real-world distance from the visual scene to the camera.
We note that we have assumed that our images were all at optical infinity. For outdoor scenes, most of the visual world is at approximately the same constant level of defocus (
Flitcroft, 2012), so this seems a reasonable assumption. Indoor scenes, however, have a wide variety of distances to different objects, and, even though the emmetropization system mostly is driven by the most distant objects (
Troilo et al., 2019), it has been proposed that this lack of “dioptric flatness” in indoor scenes could interfere with emmetropization (
Flitcroft, 2012), a possibility that we do not consider here but which should be explored.
We excluded two images as outliers because they were nearly devoid of spatial texture. Because hyperspectral image capture is slow, moving objects such as trees in wind can cause significant spatial and spectral artifacts. Therefore, we used only a subset of 25 of the images that did not have any motion artifact (as determined by the authors of the database; a list of the specific images used is shown in
Supplementary Figure S1). At each wavelength, we divided by the sensitivity function of the hyperspectral camera (file calib.txt in the database) to yield equal sensitivity per spectral frequency band.
Normal human pupil size can vary from 2 to 4 mm in bright light and from 4 to 8 mm in dim light (
Spector, 1990). Here, we assumed an intermediate pupil diameter of 4 mm. We calculated the amount of LCA at a given wavelength by the following previously published formula (
Thibos et al., 1992):
\begin{equation*}\begin{array}{@{}l@{}} Rx = p - q/\left( {lambda - c} \right)\\ p = 1.68524\\ q = 0.63346\\ c = 0.21410 \end{array}\end{equation*}
where
lambda is the wavelength (in µm; divide by 1000 to convert from nm), and
Rx is the LCA (in diopters).
This published formula sets zero LCA at 600 m, but we added 0.53 D to
Rx to re-center the formula to achieve 0 D of LCA at 500 nm. Cone absorptances as a function of wavelength were taken from the Irradiance Toolbox (
Lucas et al., 2014). Although most humans are trichromats, with short (S), medium (M), and long (L) wavelength sensitive cones, as previously discussed there are several reasons why the distinction between M and L cones might not be important for emmetropization (
Gawne & Norton, 2020). Therefore, we created a normalized absorptance profile (ML) by combining the profiles of the M and L cones. The profiles for the SWS and combined MLWS cones were then interpolated to the wavelengths of the hyperspectral images.
For each image, we calculated the results with the optics fixed but at different retinal positions corresponding to –3 D to +3 D in 0.25-D steps. In accordance with standard clinical practice, we used negative diopters to denote myopic defocus, because even though a myopic eye has too much optical power, it requires a negative lens to correct it.
Modeling the optics of the human eye can be quite involved, as there are many different factors, such as higher order aberrations and diffraction. However, because we were concerned with determining whether the blue cones could play a role in using chromatic cues to guide emmetropization, and because these have a spatial Nyquist sampling frequency in the range of 3 to 5 cycles per degree (cpd) (
Calkins, 2001) (approximately one-tenth that of nominal human perceptual visual acuity) and higher order aberrations and diffraction have very little effect in this spatial frequency range (
Artal, 2014), here we used a simplified optical model (see Discussion for more details).
When imaging a point of light at optical infinity, a perfect optical system that is in focus will have a physical image that is itself a single point. If the optical system is out of focus, the image will be of a relatively uniform disk: the circle of confusion, or CoC (
Strasburger, Bach, & Heinrich, 2018). We calculated the diameter of this disk at a specific wavelength and defocus by first using the above formula for LCA, converting from diopters to axial distance (
dx), and then creating a modified focal length:
\begin{equation*}\textit{flreal} = 16,667 + dx\end{equation*}
where
flreal and
dx are in µm. The diameter of the CoC is then given by
\begin{equation*}CoC = abs\left( {\left( {\left( {\textit{flreal} - \textit{retinapos}} \right)/\textit{flreal}} \right)*\textit{pupil}} \right)\end{equation*}
where
retinapos is the position of the retina (µm), and
pupil is the diameter of the pupil (µm). Here, we always set pupil = 4000 µm. The basic optical model is illustrated in schematic form in
Figure 2.
We then created two 401 × 401 arrays to represent the blur disks, one for the SWS cones and one for the MLWS cones. This size was chosen to ensure that the blur disks did not become larger than the arrays for the conditions used here. Initially, these arrays were set to all zeros. For each, we set a normalized blur disk, centered in the middle, with a diameter equal to CoC and an integrated intensity within the diameter of the CoC that was constant. We then calculated the two-dimensional (2D) convolution of this blur disk with the hyperspectral image at this specific wavelength. Separately for the SWS and MLWS cones, we weighted this intermediate image by the absorptance of the cones at this wavelength, multiplied by the wavelength to move from energy to relative photon counts, and then summed across all wavelengths. The result was two effective “images” (that is, 2D patterns of activity across the retinal surface): one for the SWS cones and one for the MLWS cones. We normalized each image to the same mean, as cone photoreceptors normalize their activity over a several orders of magnitude range of light intensities (
Burkhardt, 1994). These effective images are what the arrays of SWS and MLWS cones would sense, given the specific hyperspectral image and specific level of defocus. This process is shown schematically in
Figure 3.
We analyzed these two effective SWS and MLWS images by calculating the radial averaged 2D Fourier transform using code provided by Lawrence Sincich (
Adams, Sincich, & Horton, 2007). Across all 25 images, we calculated the hyperspectral drive, which we defined as the radially averaged amplitude SWS – MLWS for each image, and we then plotted the average drive function across all 25 images as a function of spatial frequency and optical defocus.