In this article, we address the problem of measuring and analyzing sensation, the subjective magnitude of one’s experience. We do this in the context of the method of triads: The sensation of the stimulus is evaluated via relative judgments of the following form: “Is stimulus

*Die Elemente der Psychophysik*(Fechner, 1860). Since Fechner’s seminal work, the “measurement of sensation magnitude”—nowadays typically referred to as “psychophysical scaling”—has been one of the central aims of psychophysics (Gescheider, 1988).

^{1}Psychophysical scaling is formally defined as the problem of quantifying the magnitude of sensation induced by a physical stimulus (Marks & Gescheider, 2002; Krantz, Luce, Suppes, & Tversky, 1971).

*discriminability*and

*subjective magnitude*in a simple way. However, the Fechnerian approach—albeit sometimes successful—has been vigorously criticized for both theoretical and empirical reasons and cannot serve as a generic method to obtain scaling function (e.g., Norris & Oliver, 1898; Stevens, 1957; Gescheider, 1988). Thurstonian scaling is an alternative approach proposed to solve the scaling problem in the tradition of linking discriminability to subjective magnitude, incorporating an internally variable mapping from stimulus to sensation (internal “noise” in modern parlance) (Thurstone, 1927). Thurstonian scaling is based on discrimination of stimuli pairs. The perceptual distance of two stimuli is determined by the probability that a human observer can discriminate them. However, like Fechner’s JND approach, Thurstonian scaling is criticized because discriminability is, at best, only

*indirectly*and in yet to be understood ways related to sensory magnitude (Krantz, 1972; Stevens, 1961).

*direct*magnitude estimation (Stevens, 1957). In this approach, a human observer is asked to provide intensity values corresponding to physical stimuli in a way that ratios of given values represent the ratios of perception. However, Shepard pointed out that there might exist an unknown and undesirable

*response transformation function*that the direct magnitude estimation method neglects (Shepard, 1981).

*method of triads*in the psychophysics literature. Based on a fixed discretization of the physical stimulus, say

**triplet question**(or, interchangeably, a triplet comparison).

*t*-distributed stochastic triplet embedding (

*t*-STE) method described below.

*supra-threshold*differences in stimulus appearance (Torgerson, 1958; Coombs, Dawes, & Tversky, 1970; Marks & Gescheider, 2002). There exists another line of research that estimates the perceptual function based on the comparisons between two pairs of stimuli levels (Schneider, 1980; Schneider, Parker, & Stein, 1974). The simple coordinate adjustment technique can estimate the one-dimensional representations of perceptual scales if it has access to the answers to all comparisons between stimuli pairs (Schneider, 1980).

**ordinal embedding**. A number of fast and accurate algorithms have been developed to solve the ordinal embedding problem (Agarwal et al., 2007; Van Der Maaten & Weinberger, 2012; Terada & von Luxburg, 2014). As we will show in this article, these algorithms may also be useful in psychophysics, vision science and the cognitive sciences in general.

*d*-dimensional Euclidean representation of items, say

**stress**(Kruskal, 1964b):

*full*dissimilarity matrix as input. Alternatively, in a setting of triplet comparisons, one can also implement the algorithm with just the knowledge on the ranking (ordering) of

*all*the distance values

*t*-STE) or, for example, the large margin principle, as in soft ordinal embedding (SOE). As another consequence, machine learning models typically operate with one big optimization problem rather than splitting the task into several separate steps (such as first estimating distances and, in a second step, constructing an embedding) the rationale being that each intermediate estimation step is yet another source of error and overfitting.

^{2}As it is the case for the stress function of NMDS, the likelihood function of MLDS is not convex with respect to the perceptual scale values

*d*-dimensional Euclidean space such that the Euclidean distances are consistent with the answers of the queried triplet questions. The consistency of an embedding with respect to triplet

**ordinal embedding**is to find an embedding

*multidimensional*embedding that describes the perceptual space of humans. Let us discuss two examples that demonstrate why this might be important. One famous example is

*color perception.*Figure 3 (left) shows the two-dimensional color circle proposed by Shepard and Ekman (Shepard, 1962; Ekman, 1954). The figure has been constructed with the NMDS algorithm based on a 14

*pitch perception*of sounds. Even though auditory frequency is again one-dimensional, the pitch is perceived along a three-dimensional helix (Shepard, 1982; Houtsma, 1995). Figure 3 (right) shows the proposed perception space by Shepard. In both cases, pitch and color, multidimensional ordinal embedding is the tool that can enable a researcher to find perceived values in higher-dimensional Euclidean spaces, which might be necessary to properly capture the similarity structure of perception (or cognition).

*t*-STE, because in our experience, they work very well and are based on a simple model that is also plausible in a psychophysics setting. The STE method introduces the probabilistic model defined in Equation 4 to solve the ordinal embedding problem. Assume that

*t*functions with a heavier tail kernel (Van Der Maaten & Weinberger, 2012). The modified method is called

*t*-distributed STE (

*t*-STE).

*t*-STE have an acceptable running time. Our experiments are performed on an iMac 18.3 (2017) with a 3.4-GHz i5 quad-core processor. On this machine, the (

*t*)-STE algorithm, implemented in MATLAB, requires about 30 min to embed 100 items in two dimensions using 2,000 triplet answers. As this analysis needs to run only once after all the triplets of all participants have been recorded, we do not think that this is a problem in a typical psychophysical setting.

*d*dimensions. In our simulations, we use

**Providing triplet answers to the algorithms.**The above mentioned model produces answers to the triplet questions, for the methods based on triplet answers, namely, SOE, (

*t*)-STE, and MLDS.

*t*-STE, we use the MATLAB implementation by Van Der Maaten and Weinberger (2012).

^{3}We use the default optimization parameters for both methods. The degree of freedom for the

*t*-Student kernel is set to

*t*-STE method. We also use the R-implementation of a second algorithm from the machine learning community, SOE, with the default parameter settings. For MLDS, we use the R-package available on the CRAN repository,

^{5}again with the default optimization parameter settings. For the NMDS algorithm, we use the MATLAB implementation, which is available by calling the function “mdscale.” The implementation optimizes the stress function defined by Kruskal (1964a); see Equation 1.

- (1) Mean squared error (MSE): For one-dimensional perceptual spaces where the ground truth is known, we can compute the MSE between the estimated scales
Display Formula \(\hat{y}\) and the true perceptual function valuesDisplay Formula \(y\). However, we need to be careful as the embedding results are only unique up to similarity transformations (scaling, rotation, and translation). So before computing the MSE, we need two steps of**normalization**. First, we transform the output of embedding to be in the range of (0,1) as our scaling functions are defined in this range (more precisely, we shift the minimum value to zero and divide all the values by the maximum). This takes care of translation and scaling. Second, the output is only unique up to rotation, which, in our one-dimensional scenario, consists of flipping the function valuesDisplay Formula \(\hat{y}\) toDisplay Formula \(-\hat{y}\) (note that ifDisplay Formula \(\hat{y}\) satisfies all triplet questions, so doesDisplay Formula \(-\hat{y}\)). Therefore, we choose the one amongDisplay Formula \(\hat{y}\) andDisplay Formula \(-\hat{y}\) that results in the smaller value of MSE. In this way, we choose the best rotation of the output. - (2) Triplet error: The MSE criterion is cumbersome to compute in multivariate scenarios, because we have to take into account all possible rotations of the embeddings. Moreover, in real-world scenarios, the MSE cannot be computed at all because the required, underlying ground truth is unknown. As an alternative, we propose to evaluate the quality of an embedding by its ability to predict the answers to (potentially new) triplet questions. To this end, we compute a quantity called the
**triplet error**. Intuitively, the triplet error counts how many of the triplets are not consistently represented by the given embedding. Given an embeddingDisplay Formula \(\hat{y}_1, ..., \hat{y}_n\) and a validation setDisplay Formula \(T^\prime\) of triplets, the triplet error of the embedding with respect toDisplay Formula \(T^\prime\) is defined aswhere the characteristic function\begin{eqnarray} \!\!\!\!\!\!\!\!\!\!\!\!\!\text{triplet error} &\,=& \frac{1}{\vert T^\prime \vert }\sum _{t=(i,j,k)\in T^\prime} \mathbb {1} \left\lbrace R_t \!\cdot {sgn}(\Vert \hat{y}_i \,{-}\, \hat{y}_k \Vert ^2\!\right.\nonumber\\ && -\left. \Vert \hat{y}_i - \hat{y}_j \Vert ^2) = 1\right\rbrace, \end{eqnarray}(5)Display Formula \(\mathbb {1}\) takes the value 1 if the expression in the curly parenthesis is true (that is, if the estimated embedding is not consistent with the new tripletDisplay Formula \(t\)), and it takes the value 0 otherwise. Typically, the given set of answered triplets needs to be used both for constructing the embedding and for evaluating its quality. There are two ways to do this. The first, naive way is to setDisplay Formula \(T^\prime =T\), meaning that we use the same set of triplets to construct the embedding and to measure its quality. In a second way, we performDisplay Formula \(k\)-fold cross-validation to avoid overfitting: We partition the set of input tripletsDisplay Formula \(T\) intoDisplay Formula \(k\) nonintersecting subsets (“folds”). We perform the embedding and the evaluationDisplay Formula \(k\) times. In each iteration, we pick one of the folds as the validation set (Display Formula \(T^\prime\)) and the rest of the folds as the training set (the input to the embedding algorithm). The final triplet error is the average over the triplet errors of theDisplay Formula \(k\) validation sets. Throughout the rest of the article, we refer to the latter approach as**cross-validated triplet error**, while the first approach is simply called the**triplet error**.

*t*-STE are quite similar to the STE; see supplementary material). The average (over 10 runs) MSE and triplet error of various embedding algorithms are depicted in Figures 5d, e, respectively.

^{6}presented in Ekman (1954). We first construct a two-dimensional embedding using NMDS; see Figure 6a. In the following, this embedding will be considered our ground truth, which will then be used to generate further data (let us stress: we do not argue that this embedding is “correct” in any way; we just use it as a ground truth to generate further simulations).

*t*-STE), and embeddings in two dimensions using all algorithms except MLDS (which is not designed for this purpose).

*cross-validated triplet error*(see the definition in the simulation setup). Figure 7 (top) shows the average and standard deviation of the cross-validated triplet error for eight subjects and the four embedding methods: MLDS, STE,

*t*-STE, and SOE. All algorithms have similar performance in this task.

*t*-STE for each of the eight subjects individually in Figure 7 (bottom). Note that these plots are generated with the full set of triplets, not only the training folds that are used to evaluate the triplet error. The resulting functions are similar, both across the two methods and across the participants. For some of the participants, we observe a noticeable difference between the embeddings of MLDS and

*t*-STE, particularly Subjects 1 and 2. For these subjects, (

*t*)-STE constructs a nonmonotonic function, while the MLDS function tends to be monotonic. The main reason for that is the nature of triplet questions. The participants were asked, “Which of the two slant pairs are more different

*reach*,

*grain*, and

*coherence*. An eidolon of a basis image corresponds to a parametrically altered version of this image. Reach controls the strength of a distortion (the higher the value, the stronger the amplification), grain modifies the fine-grainedness of the distortion (low values correspond to “highly fine-grained”), and a parameter value close to 1.0 for coherence indicates that “local image structure [is retained] even when the global image structure is destroyed” (Koenderink et al., 2017). From a perceptual point of view, we might want to know which and to what degree the image modifications influence the percept. Starting with a black and white image of a natural landscape as the basis image (see Figure 8, left), we generate 100 altered images, using reach and grain in

*t*-STE, SOE). As the best embedding dimension is unknown, we test dimensions in the range

*t*-STE consistently outperforms the other methods. Note that the results of MLDS in case

*t*-STE) leads to a cross-validated triplet error around 0.15—but is an error of 0.15 acceptable for this task? Could a (significantly) lower error be achieved if one, for example, collected more triplets? To answer this question, we would need to know the error baseline of human participants: There might be a proportion of ambiguous triplets, for example, for which no obviously “correct” answer exists. If, for example, we knew that 80% of the triplet questions had an easy, obviously correct answer, and 20% of the questions were so ambiguous that the answer was essentially random, then the best error rate we could hope for would be around 10%: On 80% of the triplets, we do not make any error, and on 20% of the triplets, we guess randomly, getting about 10% right and 10% wrong. Of course, in case of the Eidolon experiment, we do not have any external knowledge about the “difficulty” or “ambiguity” of triplets. But we can try to estimate it, and to this end, we conducted the following side experiment. We chose a set of 2,000 random triplets and asked each of them three times to each participant (triplets have been shuffled such that participants did not realize that they are answering the same triplets repeatedly). We now estimate the “difficulty” of a triplet by how consistent the repeated answers were: If a subject answers the same triplet question with different answers, we consider the question as “hard” and otherwise as “easy.” We performed this experiment with our three participants and they show the following percentage of hard triplets:

**three**questions. Therefore, the complete set of possible triplet questions contains

**cross-validated triplet error**(see Equation 5)—indeed, we suggest that this may be a good idea for MLDS and NMDS, too. The chosen subset of triplets needs to be partitioned into training and validation sets. The embedding method finds a Euclidean embedding for the perceptual scales, given the training set of triplets as input. We then calculate the cross-validated triplet error on the validation set. This procedure is preferable to the triplet error that is evaluated on the very same set that is used to construct the embedding; the latter can be highly biased and typically underestimates the true triplet error (overfitting).

*t*-STE as our method of choice. The original implementation of the authors is available at https://lvdmaaten.github.io/ste/Stochastic_Triplet_Embedding.html, implemented in MATLAB.

*International Conference on Artificial Intelligence and Statistics (AISTATS)*(pp. 11–18). San Juan, Puerto Rico: PMLR.

*Journal of Vision,*17(1), 37, doi:10.1167/17.1.37. [CrossRef]

*Advances in Neural Information Processing Systems (NIPS)*(pp. 810–818).

*International Conference on Machine Learning (ICML)*(pp. 1472–1480).

*Bernoulli*, 23(3), 1663–1693. Available from https://doi.org/10.3150/15-BEJ792, doi:10.3150/15-BEJ792.

*Conference on Learning Theory (COLT)*(Vol. 49, pp. 310–335). Columbia University, New York, New York, USA: PMLR.

*Cognitive psychology: An overview for cognitive scientists*. New York: Psychology Press.

*Journal of the Optical Society of America A,*33(3), A30–A36. [CrossRef]

*Mathematical Psychology*. Upper Saddle River, New Jersey: Prentice-Hall.

*Nature Neuroscience,*4, 1244. [CrossRef]

*IEEE Transactions on Visualization and Computer Graphics,*20, 1933–1942. [CrossRef]

*Elemente der psychophysik (elements of psychophysics)*. Leipzig: Breitkopf und Hrtel.

*Annual Review of Psychology,*39, 169–200. [CrossRef]

*Signal detection theory and psychophysics*(Vol. 1). New York, NY: Wiley.

*International Conference on Artificial Intelligence and Statistics (AISTATS)*(pp. 851–859). Fort Lauderdale, FL, USA: PMLR.

*Psychological Science*, 19(2), 196–204, doi:10.1111/j.1467-9280.2008.02067.x. [CrossRef] [PubMed]

*Hearing,*6, 262.

*Advances in Neural Information Processing Systems (NIPS)*(pp. 2711–2719). Barcelona, Spain: Curran Associates, Inc.

*Annual Allerton Conference on Communication, Control, and Computing (Allerton)*(p. 1077–1084). Monticello, IL, USA: IEEE.

*PLoS One,*10, 1–27. [CrossRef]

*European Journal of Neuroscience,*22, 212–224. [CrossRef]

*Conference on Learning Theory (COLT)*(pp. 40–67). Barcelona, Spain: PMLR.

*International Conference on Artificial Intelligence and Statistics (AISTATS)*(pp. 471–479). San Diego, California, USA: PMLR.

*Perception,*36, 1.

*Journal of Statistical Software,*25, 1–26.

*Modeling Psychophysical Data in R. Use R!*, 32, 229–256. [CrossRef]

*Journal of Vision*, 17(2), 7, doi:10.1167/17.2.7. [CrossRef] [PubMed]

*Visual psychophysics*(pp. 660–689). Berlin, Heidelberg: Springer.

*Foundations of measurement: Vol. 1. Additive and polynomial representations*. New York, NY: Academic Press.

*Psychometrika,*29, 1–27. [CrossRef]

*Psychometrika,*29, 115–129. [CrossRef]

*Psychological Research,*59, 134–144. [CrossRef]

*CogSci*(pp. 1427–1432). Philadelphia, PA, USA.

*Siam Review,*56, 3–69. [CrossRef]

*Preprint available at Arxiv, abs/1906.11655*.

*Psychological Review,*65, 222. [CrossRef]

*Journal of Mathematical Psychology,*1, 1–27. [CrossRef]

*Entropy,*17, 5402–5421. [CrossRef]

*Journal of Vision*, 3(8), 5, doi:10.1167/3.8.5. [CrossRef]

*Stevens’ handbook of experimental psychology (Vols. IV, Methodology in Experimental Psychology*. (p. 91–138). New York: John Wiley and Sons.

*System of diseases of the eye*(Vol. 3). Philadelphia: JB Lippincott.

*PLoS Computational Biology,*15, 1–27. [CrossRef]

*Psychometrika,*42, 241–266. [CrossRef]

*Cognitive Psychology,*3, 382–407. [CrossRef]

*Journal of the Optical Society of America A,*22, 801–809. [CrossRef]

*Vision Research,*44, 1511–1535. [CrossRef]

*Journal of Vision*, 7(6:3), 1–21, doi:10.1167/7.6.3.

*Perception & Psychophysics,*28, 493–503. [CrossRef]

*Psychometrika,*45, 357–372. [CrossRef]

*Journal of Mathematical Psychology,*11, 259–273. [CrossRef]

*Advances in Neural Information Processing Systems (NIPS)*(pp. 41–48). Vancouver and Whistler, British Columbia, Canada: MIT Press.

*Psychometrika,*27, 125–140. [CrossRef]

*Journal of Mathematical Psychology,*24, 21–57. [CrossRef]

*Psychological Review,*89, 305. [CrossRef]

*Journal of Personality and Social Psychology,*48, 813. [CrossRef]

*Japanese Psychological Research,*20, 7–17. [CrossRef]

*International Conference on Machine Learning (ICML)*(pp. 673–680). New York, NY, USA: ACM.

*International Conference on Machine Learning (ICML)*(Vol. 32, pp. 847–855). Bejing, China: PMLR.

*Theory and methods of scaling*. New York, NY: John Wiley.

*Conference on Human Computation and Crowdsourcing (HCOMP)*.

*International Workshop on Machine Learning for Signal Processing (MLSP)*(pp. 1–6). Santander, Spain: IEEE.

*The Stevens’ handbook of experimental psychology and cognitive neuroscience*(4th ed., Vol. V). Wiley.

*Electronic Imaging, Human Vision and Electronic Imaging*, 2017, 36–45.