Fifteen logarithmically spaced spatial frequencies per octave were used for the estimation such that ω ∈ [1, 64] cycles per degree (i.e., 6 octaves) for Experiment 1 and ω ∈ [0.5, 32] cycles per degree for Experiment 2. Contrasts used for the estimation included κ ∈ [0.001, 1] (i.e., 3 orders of magnitude) with 30 logarithmically spaced values per decade. Therefore input values used for modeling, prediction, and quantification were discretized over a 91 × 91 grid with just over 8000 values. This resolution is arbitrary and could be modified as needed.
Samples were generated either via randomly drawing uniformly from all possible combinations of frequency and contrast on the evaluation grid, or from actively learning the most informative next sample for estimation. In both cases, two types of priors were selected to combine with newly observed data to form the posterior. One was uninformative, including no information about CSFs, and the other was informative, including canonical CSF shape information.
For the uninformative prior condition, the mean function of the GP was initialized with c = 0. Zero on the latent function maps to a probability of success of 0.5, implying that without any data, the estimator assumes maximum uncertainty about the shape of the CRF. The covariance function was initialized such that s1 = s2 = l = 1. The intention of this prior was to allow the sampled data to speak for themselves to deliver a final estimate with few assumptions. In every condition, a set of phantom shaping data points were added to assist estimator convergence. These values indexed detection failures at locations well beyond any reported human CSF curve. At each octave of spatial frequency, a phantom failure at a contrast of 0.0005 was added. Another phantom failure was added at (128, 1).
For the informative prior condition, 1000 uniformly random samples across spatial frequency and contrast were divided equally among the four canonical phenotypes observed in the range of [0.5, 64] cycles per degree and labeled by the corresponding generative models. A single GP was fit over [1, 64] cycles per degree to this entire set of observations. The scaling factor of the linear kernel was then multiplied by 0.4 in order to expand the transition between stimulus response regions. This manipulation “flattened the prior” to weaken the bias it injected into the model while still staying informative. The posterior mean of this GP was then used to initialize the mean function of all later GPs. This exact prior was used for Experiment 1. For Experiment 2, the GP was fit over [0.5, 32] cycles per degree. The relationship between this prior and ground truth CSFs can be seen in
Figures 1 and
2, revealing a wide transition between behavioral response regions overlapping the threshold boundaries.
The spatial frequency and contrast of the first eight samples for all experimental conditions were deterministically selected according to a Halton set (
Halton, 1964). Halton sets are space filling but not random or grid-like, so every condition experienced identical primer sequences that sampled the stimulus space broadly. Subsequent stimuli were selected either randomly or actively. At the expense of some overall efficiency, this procedure promotes stable active learning. The primer sequence can alternatively be selected by other criteria, such as the set of stimuli determined from population screening studies to be most useful for distinguishing important phenotypes.
Two separate experiments were conducted to test the new estimator under different conditions. For Experiment 1, four ground truth generative models were created from the four canonical phenotypes depicted in
Figure 1 via high-density sampling and cubic spline fitting. Simulated psychometric spreads were fixed at 0.08. These phenotypes were selected to demonstrate estimator performance under extremes of phenotypic variation. Four combinations of sampling methods (random, active) and prior selection (uninformative, informative) were used to acquire data from the generative models. After each new data point, the CRF was updated as a posterior defined over the entire input domain. The ϕ = 0.5 contour of the predictive posterior mean of the CRF became the CSF estimate because this value forms the equiprobable boundary between the two response classes. For the symmetric lapse and guess rates used here, this is also equivalent to ψ = 0.5. At octaves of spatial frequency relative to 0.5 cycle/degree, the root mean square error (RMSE) in units of log
10 contrast between the ground truth CSF and the estimated CSF was quantified. Spatial frequencies for which the CSF would have taken values greater than 1 are excluded from this calculation. The estimated CSF was discretized to the nearest contrast grid value. Each phenotype was evaluated separately for 10 repetitions and the average behavior summarized.
Because the canonical examples are overly smooth, Experiment 2 made use of generative ground truth models taken from a cohort of neurotypicals and a cohort of individuals diagnosed with schizophrenia performing a contrast detection task as part of a previous study (
Yaghoubi et al., 2022). The study was conducted across three sites: Weill Cornell Medicine (WCM), Nathan S. Kline Institute for Psychiatric Research (NKI), and the University of California, Riverside (UCR). Seven neurotypical (NT) participants and 12 patients with schizophrenia (SZ) were recruited for the study. The total number of participants at each site was as follows: WCM: six (four males; age: mean = 33.5 years, SD = 8.48); NKI: six (three males; age: mean = 45.6 years, SD = 9.54); and UCR: seven (two males; age: mean = 19.93 yrs, SD = 2.15). All subjects reported normal or corrected-to-normal vision.
The gamified training paradigm used in this experiment derived from (
Deveau, Ozer, & Seitz, 2014). The task was administered using an Apple iPad Pro 12.9 inch screen (second generation) at a luminance of 600 cd/m
2, resolution of 2732 × 2048 pixels and pixel density of 264 pixels per inch. The viewing distance of all participants from the screen was 20 inches. The iPads used at all sites were calibrated similarly to reduce the variance between each site where the study was conducted. The stimulus set consisted of Gabor patches (targets) at six spatial frequencies. An initial test was performed for each group of participants to approximate the maximum spatial frequency that could be perceived. The spatial frequencies used for NT were 0.5, 1, 4, 8, 16 and 32 cycles per degree and for SZ were 0.5, 1, 2, 4, 8 and 16 cycles per degree, respectively. The stimuli were also presented in 8 orientations (0°, 22.5°, 45°, 67.5°, 90°, 112.5°, 135°, 157.5°). Gaussian windows of Gabors varied with σ between 0.25° and 1° and with phases (0°, 45°, 90°, 135°).
Each group of participants underwent a slightly different training procedure (i.e., SZ performed training for up to 40 sessions [one session per day]), with each session lasting approximately 30 minutes. NT performed 40 training sessions in 20 days (i.e., two sessions per day). Each session consisted of different blocks where Gabor patches at all six spatial frequencies were presented.
Each block lasted for 120 seconds where an array of targets with randomly selected orientation and increasing spatial frequency appeared all at once scattered across the screen (
Figure 3). The contrast of the target was adaptively determined using a three-down/one-up staircase. Contrast was decreased whenever 80% of the targets were selected and increased when fewer than 40% of the targets were selected with a 2.5-second per target time limit. Staircases were independently run on each spatial frequency across blocks of training. Spatially varying auditory feedback was given to the participants (i.e., low-frequency tones corresponded with targets on the bottom of the screen, whereas high-frequency tones corresponded to stimuli at the top of the screen). Thus the horizontal and vertical locations on the screen each corresponded to a unique tone. The sounds provided an important cue to the location of the visual stimuli and were included to boost learning as has been found in studies of multisensory facilitation (
Shams & Seitz, 2008).
As with most CSF tests, the CSF values from this study were computed at a small number of discrete spatial frequencies. Spline interpolation was again used to create smooth ground truth CSFs, this time of individual participants’ CSF curves. A value of 0.08 was again used as the fixed psychometric spread to produce simulated contrast response values from the generative models. Newly generated raw data in this fashion were used to train a multidimensional GP probabilistic classifier, as in Experiment 1. Estimated CSF values were again compared to ground truth CSF values at octave spatial frequencies with RMSE. Instead of multiple repeats of the same phenotype, however, in this case performance was averaged across all 19 individuals in the data set.
Standard machine learning tuning procedures to achieve consistent model convergence and high model accuracy were conducted only for Experiment 1. All estimator configurations for Experiment 2 were fixed at these values and not adjusted in an attempt to improve outcomes.