Abstract
To this day, the most popular method of choice for testing visual field defects (VFDs) is subjective standard automated perimetry. However, a need has arisen for an objective, and less time-consuming method. Pupil perimetry (PP), which uses pupil responses to onsets of bright stimuli as indications of visual sensitivity, fulfills these requirements. It is currently unclear which PP method most accurately detects VFDs. Hence, the purpose of this study is to compare three PP methods for measuring pupil responsiveness.
Unifocal (UPP), flicker (FPP), and multifocal PP (MPP) were compared by monocularly testing the inner 60 degrees of vision at 44 wedge-shaped locations. The visual field (VF) sensitivity of 18 healthy adult participants (mean age and SD 23.7 ± 3.0 years) was assessed, each under three different artificially simulated scotomas for approximately 4.5 minutes each (i.e. stimulus was not or only partially present) conditions: quadrantanopia, a 20-, and 10-degree diameter scotoma.
Stimuli that were fully present on the screen evoked strongest, partially present stimuli evoked weaker, and absent stimuli evoked the weakest pupil responses in all methods. However, the pupil responses in FPP showed stronger discriminative power for present versus absent trials (median d-prime = 6.26 ± 2.49, area under the curve [AUC] = 1.0 ± 0) and MPP performed better for fully present versus partially present trials (median d-prime = 1.19 ± 0.62, AUC = 0.80 ± 0.11).
We conducted the first in-depth comparison of three PP methods. Gaze-contingent FPP had best discriminative power for large (absolute) scotomas, whereas MPP performed slightly better with small (relative) scotomas.
Pupil size data were restructured from the continuous recording with an event-related approach using a series of steps. First, blink periods were deleted from continuous data. Blink on- and offsets were detected by setting a speed threshold of three standard deviations (SDs) above the mean. The removed blink periods were interpolated with a cubic method using the interp1 MATLAB function. In the case of UPP, we used the trial start events to window pupil responses to each trial (and thus to each wedge; every 6 seconds) in 3 second epochs. For FPP, we chose a 5 second epoch between 1 and 6 seconds after stimulus onset, therewith ignoring the initial constriction in the first second that tends to have a divergent and variable amplitude which complicates accurate FFT power estimations. In the case of MPP, we applied an event-related approach, creating 3000 ms epochs per luminance change (every 250 ms). The pupil data were then filtered per trial. Pupil traces from trial start to trial end were saved in a matrix with each row representing a trial and each column representing a timepoint. Pupil sizes were then baseline corrected by subtracting pupil size at stimulus onset. Except for MPP, pupil size was filtered for low frequency noise by subtracting a low-pass Butterworth fit (third order, 0.2 Hz cutoff). This correction allowed comparisons across participants and for FPP it ensured that the 2 Hz signal fluctuated around zero for proper frequency analyses. In addition, pupil size data were filtered to remove high-frequency noise by applying a low-pass Butterworth filter (fifth order, 30 Hz cutoff). UPP and FPP trials were removed if the pupil size variance within a trial crossed a threshold of four SD above the mean. The latter removal procedure was iterated in three loops. Note that for UPP and MPP, the pupil size moves back to baseline before the end of the 3 second epoch duration.
Subsequently, pupil size as a function of time from trial or epoch onset was first normalized across eccentricities. The average pupil traces for trials with stimulations of the largest eccentricity (fifth outer ring) without scotomas served as a baseline and any deviations from its average pupil trace for the other eccentricities were corrected. The pupil sensitivity was determined from the filtered pupil traces in a different manner per perimetry method: for UPP, the pupil constriction amplitude was used as a measure of pupil sensitivity. It was extracted per trial by subtracting minimum pupil size within a 200 to 1200 ms time window after trial onset (i.e. the period a pupil constriction has ended) from the maximum pupil size within a 0 to 500 ms time window after trial onset (i.e. the period a pupil constriction starts). For FPP, pupil oscillation power from a periodogram at 2 Hz served as a measure of pupil sensitivity. Full trial periods of pupil size were each converted to the frequency domain using a Lomb-Scargle algorithm. The convergence and calculation of pupil oscillation power was independent of and thus not affected by individual variability in phase (
Naber et al., 2018;
Portengen et al., 2021). For MPP, pupil sensitivity was operationalized as the absolute area under the event-related pupil response (ERPR) averaged across all luminance changes per wedge within a time window of 250 to 1500 ms (i.e. the period an ERPR was present and not yet moved back to baseline). For consistency, we reference to all three different manners of pupil measurement calculation as “pupil responsiveness” from now on.
To determine whether pupil responsiveness differed across scotoma types, we performed a repeated measures ANOVA. Two-dimensional pupil sensitivity maps were created with a harmonic spline interpolation to fill the gaps between the centers of the 44 stimulus wedge locations. The performance of each perimetry method was based on how well the method distinguished between present and absent stimuli across trials. The comparison across methods was made with the index
d-prime (i.e. an index of the discriminability of a signal, given by the separation between the peaks of the probability distributions, defined in z-scores), the area under the curve (AUC) of the receiver operating characteristics (ROC), and the adjusted effect size for small sample sizes (Hedge's
g). The
d-prime, and AUC values per participant were compared across the three methods with paired double-sided
t-tests. Stimulus protocol scripts, data, and analysis files are available on
https://osf.io/bqwk8.
This is the first study comparing three different PP methods. From our results, we can conclude that all three PP methods show high discriminative power for differentiating between present and absent stimuli, and between partially present and absent stimuli in healthy adults.
Especially FPP turned out to be qualified to distinguish between present and absent (and partially present) stimuli. One explanation for FPPs high diagnostic accuracy might be that the combination of the single stimulus presentation with an increased number of pupillary measurements in a short time period resulted in multiple, reliable phasic pupil responses (i.e. decreasing the chance of incidental pupil fluctuations). These responses could in turn be particularly well suited to distinguish between within-field anisotropies as opposed to looking at average sensitivity across the VF and between damaged and intact VFs in a clinical setting (
Naber et al., 2018). Others used flickering stimuli at higher frequencies (i.e. 15 and 30 Hz;
James, Kolic, Bedford, & Maddess, 2012;
Sabeti, Maddess, Essex, Saikal, James, & Carle, 2014). However, frequencies above 3 to 4 Hz do not evoke the oscillating pupil responses inherent to the flickering method of this study (
Naber et al., 2013). The results suggest that a stimulus paradigm with high spatial sparseness and low sparseness of events leads to overall best power in dissociating present, partially present and absent stimuli. The high pupil sensitivity to detect hemianopic and quadrantanopic scotomas due to cortical damage, and glaucoma-caused scotomas, displayed in the first FPP study of
Naber et al. (2018), endorse the results found in this study.
Our results showed small between-subject differences for sensitivity measures across visibility conditions and PP methods. Conversely, large individual variation was seen in present versus partially present trials; distinguishing between these conditions remains a challenge when using PP methods (MPP performed only slightly better). This imprecision can partly be explained by the use of large stimulus sizes, which sacrifices spatial precision in the peripheral VF. The presentation of large stimuli is a prerequisite for evoking more reliable pupil responses, but results in coarse sensitivity maps.
It is also not yet possible to dissociate exact VFD locations within a stimulus wedge. To resolve this, a similar stimulus map used by
Maddess, Essex, Kolic, Carle, & James (2013), which uses overlapping stimuli shown at different time intervals, or smaller stimuli at more locations like
Naber et al. (2018) could be used (with weaker pupil responses as a result). Thus, PP methods are currently more suited for screening purposes than for regular follow-up and monitoring small changes in the VF across time. Conversely, because of the flexible setup of pupil perimetry, protocols can easily be interchanged and adjusted. Varying PP protocols could be incorporated for different goals; larger and less stimuli to quickly screen for clinically significant VFDs, and smaller stimuli at more locations to accurately detect small changes during follow-up. Further development could entail automation of a direct diagnostic report and a scotoma edge detection algorithm.
Note, however, that improvements can still be made to the current PP paradigms. Most developments have been reported for MPP (
Carle, James, Colic, Essex, & Maddess, 2015;
Carle, James, Sabeti, Kolic, Essex, Shean, Jeans, Saikal, Licinio, & Maddess, 2022;
Sabeti, James, & Maddess, 2011;
Sabeti, James, Essex, & Maddess, 2013;
Tan et al., 2001;
Wilhelm et al., 2000). Our MPP variant was performed with a relatively high framerate (a possible change in luminance every 250 ms) and long stimulus-on durations and thus differed from state-of-the-art MPP methods in some respects. For example, the method of
Wilhelm et al. (2000) involved a scaled honeycomb array and covered 50 degrees of VF, their stimuli were presented with a 50% probability in each test-region, similar to the original electroretinogram (ERG) multifocal method proposed by
Sutter (1991;
Sutter & Tran, 1992);
Tan et al. (2001) created temporally more sparse stimuli by inserting blank frames between frames containing stimuli;
Sabeti et al. (2011), and
Ho, Wong, Carle, James, Kolic, Maddess, & Goh. (2010) used colored stimuli and a higher presentation frequency, resulting in high temporal sparseness due to short stimulus durations and long inter-stimulus intervals. The most recent MPP method of
Carle et al. (2022) features a clustered volley technique, which brings the stimuli closer to each other, and longer interstimulus times than previous iterations, actually making it resemble FPP more with respect to spatiotemporal properties. However, Carle et al.’s MPP method also implements color, luminance balancing (i.e. variance in luminance across stimulus locations), and no black stim-off region. Nonetheless, these improvements can also be implemented in FPP (and UPP), meaning that the here reported differences across PP methods remain valid despite the use of rather basic stimulus paradigms.
It is possible that pupil responses become more sensitive when evoked with fewer stimulus changes per second (e.g. 1 Hz instead of 2 Hz) and a spatial sparseness somewhere in between the range of 1 and approximately half of the 44 locations, as pupil responses seem to be stronger when more stimuli are shown, even at a constant luminance (
Castaldi, Pomè, Cicchini, Burr, & Binda, 2021). Although out of the scope of the current study, an optimal spatial and temporal sparseness remains to be found. Nonetheless, the main advantage of endorsing a lower temporal sparseness lies within more data points per trial and consequently shorter testing times.
Although unifocal and flicker PP methods benefit from an attentional cueing paradigm (
Binda & Murray, 2015;
Einhäuser, 2017;
Mathôt & Van der Stigchel, 2015;
Naber et al., 2013;
Portengen et al., 2021), evidence has been provided that a centrally directed attentional task and covertly directed attention reduces signal quality on multifocal methods (
Rosli, Carle, Ho, James, Kolic, Rohan, & Maddess, 2018). This likely stems from a divided attention across multiple simultaneously shown stimuli.
The current study used a dark gray background to suppress the influence of stray light (seen with black backgrounds) and to increase pupil responsiveness (as compared to a lighter gray background;
Portengen et al., 2021). This testing method may be improved even more by implementing chromatic properties, such as hue, brightness, and saturation, to strengthen pupil response amplitudes driven by contrasts between those properties and to isolate the retinal opsin, rhodopsin, or melanopsin pathways (
Carle et al., 2015;
Chibel, Sher, Ben Ner, Mhajna, Achiron, Hajyahia, Skaat, Berchenko, Oberman, Kalter-Leibovici, Freedman, & Rotenstreich, 2016;
Maeda, Kelbsch, Straßer, Skorkovská, Peters, Wilhelm, & Wilhelm, 2017). The use of narrow band yellow (around 580 nm) rather than full visible spectrum white (the latter includes blue light) stimuli may reduce blue color-sensitive melanopsin retinal ganglion cell activity and its effect on pupil responses and therewith could contribute to a more accurate diagnosis of VFDs specifically caused by cortical damage (
Rosli et al., 2018).
A limitation of the current study is that no normative data from a “no scotoma condition” was used in the analysis, and left versus right VFs per participant may have contained small biases due to temporal versus nasal anisotropies. Although these biases did not hamper the comparison between methods, overall discriminative power could be weaker than when normative data were used. Another limitation pertains to the use of hard edges for the artificially simulated VFDs, which do not accurately represent real world situations with actual VFDs. Although simulating VFDs in healthy participants is an established strategy (e.g.
Gestefeld et al., 2020), it does not mimic VFDs entirely. Real scotomas tend to have smooth edges with a gradual gradient from visible to invisible. Due to limitations of the used computer, computing soft edged wedges leads to technical problems, such as slower frame rates. The wedges were created in real-time to ensure a different order of appearance for each participant. In future studies, this could be resolved by creating multiple videos with random presentation orders in advance rather than on-line stimulus buffering. Regardless, several studies showed promising results with PP in more realistic situations, such as detecting the blind spot (
Portengen et al., 2021) and testing patients suffering from VFDs (e.g.
Carle et al., 2015;
Chibel et al., 2016;
Kardon, 1992;
Maeda et al., 2017;
Naber et al., 2018;
Rajan, Bremner, & Riordan-Eva, 2002;
Schmid, Luedtke, Wilhelm, & Wilhelm, 2005;
Skorkovská, Lüdtke, Wilhelm, & Wilhelm, 2009;
Skorkovská, Wilhelm, Lüdtke, & Wilhelm, 2009;
Tan et al., 2001;
Yoshitomi, Matsui, Tanakadate, & Ishikawa, 1999). Future studies testing subjects with VF defects due to neurological impairment will further clarify the role of PP in testing VFs.
As a last point, it is important to stress PPs advantages over SAP. In addition to its high accuracy in detecting artificial scotomas, PP is an objective method for testing VF in a short amount of time (approximately 4 minutes per eye and method). This is comparable to subjective fast SAP methods, such as the Swedish Interactive Testing Algorithm (SITA) 24-2 FAST and Tendency-Oriented Perimetry (TOP). Combined with the minimal cooperation required, this method might have merit for application in young children or neurologically impaired subjects affected by cerebral visual impairment who generally show difficulties in completing an SAP test reliably (
Patel, Cumberland, Walters, Russell-Eggitt, Rahi, & OPTIC Study Group, 2015;
Wong & Sharpe, 2000). Current alternatives for young or neurologically impaired children are behavioral perimetry tests, such as the behavioral VF (BEFIE) screening test. The BEFIE test shows high specificity and sensitivity for absolute peripheral VFDs in neurologically impaired children (
Koenraads et al., 2015). Additionally, the BEFIE test detects VFDs 4 years earlier than SAP (
Portengen, Koenraads, Imhof, & Porro, 2020). However, limitations of the BEFIE test are the need of two assessors along with the inability to test the central VF and detect relative scotomas. PP circumvents these limitations and might be a suitable alternative to objectively test this patient group. Future studies may determine whether PP can map the VFs of children in an accurate, quick, and engaging way.
Supported by the ODAS foundation (grant number 2017-03), the Rotterdamse Stichting Blindenbelangen (grant number B20170004), the F.P. Fischer Foundation (grant number 170511), and a grant from the Janivo Foundation (grant number 2017170). M.N. is supported by a grant from UitZicht (grant 2018-10, fund involved: Rotterdamse Stichting Blindenbelangen).
Commercial relationships: none.
Corresponding author: Brendan L. Portengen.
Email: b.l.portengen-2@umcutrecht.nl.
Address: Department of Ophthalmology, University Medical Center Utrecht, PO Box 85500, Room E 03.136, 3508 GA Utrecht, The Netherlands.