Abstract
Meanings and qualities are fundamental attributes of visual awareness. We propose “eidolons” as a tool for establishing equivalence classes of appearance along meaningful dimensions. The “eidolon factory” is an algorithm that generates stimuli in such a meaningful and transparent way. The algorithm allows us to focus on location, scale, and size of perceptually salient structures, proto-objects, and perhaps even semantics rather than global overall parameters, such as contrast and spatial frequency. The eidolon factory is based on models of the psychogenesis of visual awareness. It affects the image in terms of the disruption of image structure across space and spatial scales. This is a very general method with many potential applications. We illustrate a few instances. We present results for the example of tarachopic amblyopia, showing that scrambled vision is indeed an apt interpretation.
Wouldn't it be nice to experience what colorblind people are seeing or what tarachopic amblyopes have to cope with? Viénot, Brettel, Ott, M'Barek, and Mollon (
1995) told us about the former by simulating the visual appearance of unilateral dichromats. Hess (
1982) informed us about the latter by requiring patients to draw what they see. This is by no means trivial. Unilateral dichromats are not all that clear in their reports (Sloan & Wollach,
1948), whereas they surely should be the first to know! It becomes clear that such introspective reports are less easy to come up with than it might seem if you try to describe—even to yourself—what you experience as you see a book shelf in the periphery of your visual field. You may somehow be aware of the presence of books with titles written on their spines, yet you cannot identify the books or read the titles. The perceptual quality of peripheral vision appears contradictory and defies your ability to describe your sensations, yet surely
you are the first to know. Perhaps Titchener's (
1902) lab manual should be consulted; he explains in detail how to use introspection.
Phenomena such as visual crowding have been studied by measuring detection and discrimination thresholds. The appearance of suprathreshold stimuli is much harder to study. Exceptions are very specific situations. One instance is direct contamination between flanker and target objects (Greenwood, Bex, & Dakin,
2010). One really needs first-person reports—for instance, having observers draw what they see (Metzger,
1936; Sayim & Wagemans,
2013). However, the (in)ability of observers to reproduce their visual experience limits this technique to relatively simple stimuli. Another way is to use verbal descriptions (for review, see Lettvin,
1976; Metzger,
1936; Pelli,
2008). Of course, the use of first-person reports is beset with difficulties.
Another approach might take its lead from methods as proposed by Viénot et al. (
1995), who transformed color images to emulate dichromatic vision. Why not preprocess a stimulus in ways that emulate peripheral processing and present it foveally? Indeed, such methods have been proposed and used to good advantage by authors such as Rosenholtz and colleagues (e.g., Balas, Nakano, & Rosenholtz,
2009). In such cases, one really needs to test various methods of preprocessing to mimic the peripheral data. This might (at least) serve to generate various hypotheses about what goes on in peripheral vision as viable or perhaps worth pursuing.
Such methods have considerable potential. However, in order to wield them effectively, one needs to be able to explore a reasonable environment of the original stimulus. Because images can be transformed into other images in infinite ways, whereas empirical research can explore only limited ranges, there is a need for controlled variation based on our present understanding of visual processes. This is by no means an understood issue and has remained more of an art (Balas & Conlin,
2015). One really needs a much more transparent and intuitive way of parameterizing and perturbing stimuli.
Here we introduce a novel processing algorithm that produces stimuli differing from a given image in ways that are controlled by a limited number of clearly understandable parameters. Within this parametric space, we refer to the subset of stimuli that are equivalent along a given perceptual domain as
eidolons (see
Appendix D).
1,2
Of course, one could use a very simple parameter space. The variable contrast of sine-wave gratings used in a modulation transfer (MTF; Schade,
1948,
1956) measurement draws on a one-parameter space of eidolons—the parameter being Michelson contrast. On the other side one could use a very complex parameter space to the point of parameterizing the luminance of every pixel within an image. At the same time, one could use a very strict criterion to establish perceptual equivalence (e.g., defining two stimuli as equivalent only when they are metameric) that is by all means indistinguishable or a wider sense equivalence criterion based, for instance, on semantics. In this wider sense, all the well-known—even famous—instances of Leonardo's Mona Lisa by Marcel Duchamp, Fernando Botero, and many other artists are eidolons.
In experimental phenomenology, one aims at parameterizations that naturally fit generic visual presentations rather than imposed physical ones. An example of two physical parameters that do not map to perception is contrast and sharpness in images. As photographers know, a high-contrast print often serves to save a slightly unsharp shot. Likewise, low-contrast prints are often considered unsharp. In such cases a natural space of eidolons may be a useful platform from which to launch research.
Another potential use of eidolons is in the study of visual anomalies and agnosias. Well-known examples are renderings of color photographs that are intended to suggest to the normal trichromat what the experiences of various dichromats might be like. While this use of eidolons might not directly answer scientific questions, such examples are useful because they offer the generic observer an opportunity to better understand more or less singular ones and thus interact more effectively with them. For instance, one might adapt one's printed or projected figures and text for more universally effective communication. Thus, the topic holds some genuine interest, even though mostly from an applied science point of view.
The body of this article is structured in three sections.
Theory details the theoretical framework in which we define our concept of eidolon.
Implementation contains the general description of an eidolon factory based on scale decomposition and spatial disarray.
Examples shows some examples of experiments in which stimuli produced by the eidolon factory could be used.
The notion of “local sign” (meaning “positional signature”;
Localzeichen in German
10) is due to Hermann Lotze (
1852). Although it was considered of major importance at the time (mid-19th century) and frequently occupied people such as Helmholtz (
1892), it has largely been forgotten today. In order to appreciate what the local sign problem is, consider the following.
Axons carry spike trains, one action potential looking much like the other. A recording from any given axon will fail to reveal which location of the visual field the neuron is serving, what the specific property the neuron is messaging about might be, what its preferred direction or orientation is, what the current gain factor of the neuron is, and so forth. Of course, the brain scientist knows. This is possible because the scientist knows the current stimulus, has poked an electrode at a certain location, can look at the anatomy, and might have additional information concerning the neuron. But how about the brain itself? It does not know the stimulus, it cannot look at its own anatomy, and it is unlikely to maintain databases on all of its neurons.
Modern brain scientists consider the discovery of somatotopy to have rendered Lotze's problem a nonissue
11 (Kaas,
1990). We hold a rather different opinion on this and believe that the problem might be even more pressing than it was felt to be at the time.
There are indications that an apparently physiologically intact visual brain might still lack local sign. We refer to the condition of
tarachopia (“scrambled vision”) as reported by Hess (
1982). In a case of unilateral tarachopia, the tarachopic eye might have acuity and contrast sensitivity just as good as that of the normal eye, yet the amblyopia might be serious enough to prevent the owner making out today's newspaper's headlines. Apparently the tarachopia involves lacking or disturbed local sign. The local structures are there, but the differential geometric connections are lacking. The physiological basis is (as yet) unknown, so to this day tarachopia had to be classified as an agnosia (
Seelenblindheit, or “soul blindness”). This shows that local sign is an ill-understood mechanism that lies at the basis of visual awareness and is not a mere philosophical fiction, as is sometimes suggested.
We will not enter into the possible mechanisms of local sign here (see elsewhere—e.g., Koenderink,
1984a), but we suggest that scrambling local sign, or local disarray of image elements, might be an apt model of certain forms of visual equivalences. For instance, the phenomenon of crowding in the peripheral visual field (Bouma,
1970; Pelli,
2008) is phenomenologically very similar to the tarachopic condition as it occurs in the focal vision of certain patients (e.g., Hess,
1982; Sayim & Wagemans,
2013).
From a phenomenological perspective, at least some aspects of local sign appear to be implemented on the fly in the psychogenesis of visual awareness. If one cuts an image into pieces and displaces the pieces randomly, the disarrayed image looks fairly normal (
Figure 2). Masking the seams between the pieces (e.g., with gray stripes) results in the experience of an undisturbed image, seen behind the mask (Koenderink, Richards, & van Doorn,
2012a). This also works in space time (Koenderink, Richards, & van Doorn,
2012b). In such cases the optical data are incoherent, whereas the visual awareness is coherent—a most remarkable fact! Apparently, the perception constructs an orderly image where physically there is chaos. These effects work over large distances and time spans. They remain ill (euphemism for “not at all”) understood.
Such empirical facts suggest that the psychogenesis of visual awareness imposes spatiotemporal coherence in an active way. Possibly classical local sign might—at least partly—depend on this. The crowding phenomenon (Bouma,
1970; Pelli,
2008) further suggests that such a mechanism is confined to focal vision. We expect the eidolon factory to become an important tool in the study of such phenomena.
Every factory run necessarily starts with a fiducial image. This image is the basis for a huge data structure that is perhaps best thought of as a simulation of the cortical activity induced by that image. This is the structure that will eventually be used in the synthesis. Once set up, the fiducial image itself has become irrelevant. This is not so much an analysis as merely a dumb formatting stage.
Consider an image as a discrete sample of a scalar field (intensity say) defined over the Euclidean plane. The first chore is to represent it at multiple levels of resolution. This is the proper topic of scale space (Florack,
1997; Koenderink,
1984b; Lindeberg,
1994; ter Haar Romeny,
2008). It has long since become a de facto standard in image processing. In practice, one samples both the scale and the space domain discretely.
In order to understand the scale-space data structure one needs a number of important insights. These are all based on the basic scale-space structure. Although we will not prove it here (textbooks quoted previously), scale-space is based on the Gaussian kernel as point. A convenient scale parameter is the half-width (standard deviation) of the Gaussian blurring kernel. In
Figure 3 we show the (sampled) scale space for an image.
Because the image at some scale can be computed from any image at some finer scale, this representation is inconveniently—and quite unbrainlike—redundant. It is preferable to find the differences between adjacent scale levels. A stack of such difference images will be our basic data structure. It represents the simplest possible nontrivial structure that has the desirable properties listed above: local, isotropic, translation invariant, and self-similar.
The difference layers are just the fiducial image as represented in difference of Gaussian (DOG ) receptive fields of various sizes (
Figure 4). Another way to understand these layers is as a stack of scale derivatives of the image. This explains the use of DOG filters in sharpening images (Margulis,
1998,
2005) in applications such as Adobe Photoshop (San Jose, CA).
The implication is that all layers added together simply recompose the image. This is indeed obvious from the differences definition. But although trivial, this is a crucial point. The eidolon factory analyzes the image and synthesizes it again. If done exactly like this, you would construct the perfect doppelgänger—namely, the picture itself! The desirable fuzziness of the eidolons derives from perturbations applied to the parts before the synthesis. This is the eidolon factory in a nutshell.
Yet another interpretation is important. The differences show local transition regions. This is different from edge finder images in that the local transitions have sides, whereas mere edginess—as yielded by programs such as Photoshop—fails to represent that. This is an important topic. It suggests that the integration over DOG activity can also be understood as a synthesis that combines all edgelet (or transition area) samples. This can indeed be proven formally (Koenderink et al.,
2016). It is somewhat intricate because the argument involves edge finders, line finders (directionally and orientationally tuned “simple cells”;
Figure 5; Hubel & Wiesel,
1968), and Laplacian operators. It allows one to use the DOG responses as summaries of the (far more numerous and involved!) simple cell responses. For details we refer to earlier work (Koenderink,
1990).
In perturbing local sign, we identify three conceptually different handles. These relate to the structure of the disarray fields. We denote these reach, grain, and coherence. We discuss each in sequence. All are based on disarray generated by Gaussian random fields that are statistically uniform and isotropic. Of course, it is possible—and sometimes desirable—to consider nonuniform and/or nonisotropic perturbations. This is perfectly possible (we show an example later), but one needs then to forge a suitable parameterization that applies to such special cases.
Basic Gaussian random fields are easily obtained by blurring Gaussian white noise. Generating two independent instances at the same blur level and with the same power yields Gaussian random displacement vector fields. One simply treats the scalar fields as the Cartesian coordinates of displacement vectors. Here we use the fact that an isotropic normal distribution in the plane is separable into two mutually independent scalar normal distributions. This is an important, highly remarkable property that renders the normal distribution unique. In
Figure 8 we illustrate such random displacement fields.
The example of
Figure 8 already introduces one fundamental parameter, the grain. Another parameter, one that is independent of grain, we call the
reach. The reach of the disarray (
Figure 9) measures how much a pixel will be displaced in some suitable statistical sense. Thus, it is an amplitude, or intensity-like parameter. If one scales all vectors of a random vector field by the same factor, one changes the reach, whereas the grain is not affected.
The grain and the reach suffice to parameterize many disarrays of interest. A third parameter comes into play when one regards details at different scales. We illustrate this with
Figure 10.
The coherence over scale of the disarray is the third parameter. It defines the degree to which the displacement fields are correlated across scales. It takes values of zero (incoherent) if the random displacement field is generated independently for each scale; it takes a value of one (fully coherent) if the displacement fields at every scale are constructed by filtering the same Gaussian white noise samples. Coherence measures how the displacements of overlapping receptive fields of different sizes are mutually correlated. This is highly important in many applications. Coherent disarray retains the local image structure even when the global image structure is destroyed. As a result, coherent disarray appears like deformation, whereas incoherent disarray appears like diffusion or shuffling.
Each of these parameters—the grain, the reach, and the coherence—has a specific and characteristic effect. Of course, apart from the disarray, there is also the blur of the original image even before it is subjected to disarray. One may consider the blur as a fourth parameter, although it is one that is really distinct from the local sign variation. Blur and disarray can be combined in various ways.
In this article we do not so much present a fixed algorithm as a class of intimately related methods. Instead of a single algorithm with several (perhaps many) parameters, we suggest a toolbox in which one selects the most appropriate method or combination of methods for any problem to implement certain desiderata in the simplest and most efficient way.
The toolbox method lets one zoom in in an intuitive manner, with the nature of the remaining parameters being immediately apparent. This allows you to focus on the phenomena themselves. No doubt most eidolon factory implementations will be ephemeral, constructed for a specific purpose. However, the toolbox will be fixed; at most it will be added to, although only sparingly.
Related to this is that we do not consider the eidolon factory to be a straight emulation of the cortex at all. What we try to provide is an interface to the phenomenal complexity, whether in the mind, the brain, or the world. Good interfaces hide irrelevant complexity and are therefore very different from emulations (Hoffman,
2009; Koenderink,
2011). The toolbox preferably implements summary accounts rather than simulations of complexities that are irrelevant to the task at hand. Ideally, the eidolon factory should be trivial and thus conceptually transparent.
Here are some examples of this type of approach. First-order directional simple cells are similar to edge finders, or edge detectors. In our formalism, they are tangent vectors at some specific scale. Because the tangent space is two dimensional, a basis of just two of these—most conveniently with orthogonal direction preferences—suffices at each location.
13 This is functionally equivalent to the overdetermined continuous basis in the cortical implementation, for any directional sensitivity can be implemented on the fly (often known as
steerable filters; W. T. Freeman & Adelson,
1991). This is a natural consequence of formal differential geometry (Koenderink & van Doorn,
1992). Of course, in real life one needs to consider the effects of perturbations. For instance, removing one basis vector hardly matters to the cortical overdetermined basis but would render the simple formal system inoperative.
Something similar goes for the second-order structure. In the cortex this is the system of line finder simple cells. In the formal treatment, one requires a basis of three items: either components of the Hessian (a symmetric tensor) or second-order directional derivatives at 60° orientation increments. Thus, a cortical column can be summarized by such a triple. Again, any orientation can be implemented on the fly. There is no loss of computational possibilities at the cost of being a mere summary account.
Starting with these differential operators of order less than three, there are a number of relations that are of immediate importance to possible synthesis and thus the eidolon factory. The addition of line finders at all orientations at a given location yields the Laplacian operator, which is the DOG profile for infinitesimal (small) size difference. Indeed, the DOG layers give a visually intuitive edge representation at a given scale. This makes intuitive sense because the difference of sharp and blurred instances of an image retains just the boundary regions. The edges are drawn as the Pinna watercolor illusion double lines (Pinna,
1987,
2008; Pinna et al.,
2001,
2003). Because all DOG layers add up to the fiducial image, one sees that the image can be regarded as synthesized from its edgelets. This is indeed a formal theorem (Koenderink et al.,
2015).
With such relations in mind, it is understandable that perturbations in a complete cortexlike emulation can be captured by summary methods at various levels, from that of the simple cells all the way to the fiducial image. When this is indeed possible, one should opt for it because it allows one to ignore much detail that is actually causally ineffective.
This is the very notion behind the proposal of the eidolon factory as a toolbox. In psychophysical experiments it would make good sense to start with the simplest eidolons and progress to more complicated instances—perhaps eventually going all the way to the simple cells—when the empirical data cannot be described in the simpler way. After all, understanding vision means understanding it
conceptually, not building an emulation of brain events that can itself hardly be understood because of an overdose of causally ineffective complexity.
14
The visual field is roughly scale invariant. Not being dedicated to any specific scale rules out the single-scale disarray discussed above. One needs to acknowledge the existence of many scales. This cannot be done by mere blurring. Fourier-based methods cannot deal with this. The method somehow has to recognize the spectrum of scales. The simplest method that does this is fractal disarray. It simply disorganizes the fiducial image, but it does this in a scale-independent manner.
There are numerous ways in which one might implement scale-dependent disarray. We implemented a very simple parameterization for demonstration purposes. One simply distributes the degree to which large fields drag smaller ones with them monotonically over the scale domain—that is, the degree to which the random displacement at any given location is correlated across scales. The efficaciousness of this relation then becomes the control.
This control effectively controls the fractal dimension of the local sign displacement field. It has a huge influence on pictorial structure. This is the coherence. It is a parameter of considerable conceptual interest and, in our view, a major aspect of human vision that is hardly documented in the standard textbooks.
Here we illustrate the use of eidolons to simulate deficits akin to the ones observed in the amblyopic condition of tarachopia in generic observers. Tarachopia is a visual agnosia described by Hess in 1982. Hess described a form of amblyopia in which an observer had two perfectly good eyes according to standard criteria. That is, the MTF functions for sine-wave grating contrast thresholds were virtually identical for both eyes, with good contrast detection threshold and good high spatial frequency roll-off. Yet, in the visual awareness of the observer, one eye was normal, whereas the other was unfit to read the headlines a newspaper. It was classified as amblyopic, which is why the patient had sought treatment. So what was wrong? Hess had the bold idea to let the patient report on the nature of immediate visual awareness. Compared with the “good” eye, the visual awareness of the “bad” eye appeared spatially scrambled to the patient. Hence Hess' term tarachopia for this form of amblyopia, which means “scrambled vision.”
Tarachopia is a paradigmatic case because it shows the categorical difference between psychophysics proper and experimental phenomenology (Albertazzi,
2013; Koenderink,
2015a). In psychophysics proper, one uses only objective methods. But awareness is about subjective facts, a topic of phenomenology. When Hess measured contrast detection thresholds for sine-wave gratings, he was practicing psychophysics. But when he asked the patients what they saw, he switched over to experimental phenomenology, where objectivity is replaced with intersubjectivity.
Thus, the topic is a conceptually highly interesting one with important potential implications for neuroscience, psychophysics, and phenomenology. This is why we propose to study tarachopia in generic observers by imposing controlled spatial disarray on the stimulus.
We attempt to create artificial tarachopia in normal observers using eidolons. This yields a way to the classification of possibly distinct forms of tarachopia that can potentially be explored through variation of parameterization of the eidolon factory. It thus opens up a novel field of endeavor.
We depart from the standard MTF analysis of spatial vision (Cornsweet,
1970; Van Nes & Bouman,
1967), the detection thresholds for sine-wave grating modulations of an otherwise uniformly gray field. Generic human observers fail to detect gratings at spatial frequencies higher than about 50 cycles/degree and detect a range of intermediate spatial frequencies at Michelson modulations somewhat below 1%. Such analysis was introduced in television engineering by Otto Schade (
1948,
1956), who measured the first MTF.
One typically records the detection of a grating as compared to a uniform field. A more objective method records the ability of observers to discriminate between horizontal and vertical gratings. For generic observers, these methods yield very similar results. However, in the case of tarachopic amblyopes, the difference is categorical. Such observers detect the presence of gratings just as well as any other, yet they fail in the ability to discriminate the orientation. They detect a pattern but fail to identify it. Hess' suggestion that this might be due to their scrambled visual field makes intuitive sense.
In order to study this suggestion in generic observers, we produce eidolons of sine-wave gratings using a coherent local sign disarray. Representative examples of stimuli are shown in
Figure 15.
The technical details of the experiment are pretty standard. The field of view was 10° × 10°, the average luminance of the uniform surround (30° × 20°) was 400 cd/m2, the viewing distance was 57 cm, and the experiment was conducted in a dark room. We used a smooth transition of the grating to the uniform field of 2° wide. The display was linearized. We used an 8-bit digital analog converter, which is only marginally suited to the usual MTF measurements but yields ample resolution for the eidolons. Observers were the authors. They have good spatial vision and fixated the center of the screen. Notice that there is no need for observers to be naïve in threshold experiments.
We performed two measurements. In the first we determined the MTF function for seeing “something” instead of “nothing.” In the second we determined the threshold for discrimination between horizontal and vertical gratings. In the first method we aimed at the 50% detection threshold, and in the second we aimed at the 75% discrimination threshold. Thresholds were determined by way of simple up–down methods. Results are not different for the three observers; we show the overall average in
Figure 16.
For the detection of contrast, irrespective of pattern, there is no appreciable difference between the sine-wave gratings and the eidolons. Both contrast sensitivity and high spatial frequency roll-off are normal. But the case of recognition—in this case, of the pattern overall impression of horizontal or vertical orientation—the eidolons suffer badly.
In the case of the true gratings, observers are aware of the orientation at threshold. But in the case of the eidolons, observers may have a hard time making out the orientation even if they see the contrast modulations well enough. Of course, this is hardly a surprise given the nature of the stimuli (
Figure 16)! The paradigm put the generic observer in a position that is similar to that in which the tarachopic amblyope finds herself when viewing pure grating patterns.
Notice that the eidolon paradigm neatly emulates Hess' findings for a tarachopic observer in generic vision. To the extent that all psychophysical testing yields the same results, the subjective report of the amblyope can be replaced with the stimulus description (
Figure 15). Thus, in a certain sense, the eidolon paradigm yields an objective emulation of the subjective report. Of course, the direct proof of perceptual equivalence could be achieved only when tarachopic observers are confronted with intact images in the affected eye and eidolons in the healthy eye. We suggest this as an obvious development of this line of research.
There are infinitely many ways to generate eidolons. The cloud of acceptable variations on any fiducial image is huge, albeit nothing when compared with the space of all possible images. We're talking infinities here. One has to make choices. Any such choice had better be based on some fundamental considerations.
The most general methods assume nothing—neither physiological nor phenomenological nor ecological prior knowledge. Here is a simple example, but there are numerous others. Most of the ones we can think of have been used at one or other occasion. Given a pixelated image, one simply interchanges two randomly selected pixels and repeats this a number of times. Replacing pixels with random values yields a similar result. It is technically known as “salt-and-pepper noise” (Jayaraman, Esakkirajan, & Veerakumar,
2009;
Figure 20). A good parameter would be the ratio of the number of swaps to the total number of pixels. A parameter zero will return the fiducial image; a value of one will yield a totally random image. Such eidolons may well be useful in certain psychophysical contexts. From a phenomenological perspective they are trivial, and from an esthetic perspective they are appalling. They look exactly as they are—that is, alien to genesis of visual awareness. No doubt one could quantify this by computing various measures typical for cortical representations.
Other well-known examples of this general class involve adding some type of noise pattern to the image. Such methods have frequently been used in vision research. They work best with the abstract, nonsense images commonly used in psychophysics. Recognizable images are remarkably resistant because psychogenesis is expert at beating the “cocktail party effect” (Bronkhorst,
2000; Shinn-Cunningham,
2008), which is why the familiar signal-noise methods from engineering (Kailath, Sayed, & Hassibi,
2000; Kay,
1993; Scharf,
1991) are unlikely to apply.
Other methods base eidolon generation on the essentially arbitrary way digital images are conventionally stored. A case that has become famous in vision research uses blocking (Harmon,
1973; Harmon & Julesz,
1973). Such eidolons have nothing to do with the intrinsic structure of images (e.g., why not have a honeycomb array of hexagonal pixels instead of a Cartesian checkerboard?), the known physiology, or the phenomenology of the visual field (
Figure 21). They allow easy parameterization of structural complexity and are inherently local—both desirable properties.
Well-known instances are phase-scrambled images (Oppenheim & Lim,
1981; Thomson,
1999; Vogels,
1999). Here, disarray is applied in the spectral domain. The methods were perhaps inspired by the notion that sine-wave gratings are natural parts of images or somehow special to what the primary visual cortex is up to. Both (mutually related) notions are false, but that is not the point here. What makes Fourier analysis special is that it is global. This has indeed some—though not much—relation to what might be desirable for a biologically viable optic sensor system. The eidolons one obtains look somewhat better than those from the previous example (
Figure 22), yet they don't look natural. Apparently, the
global nature of the parts is problematic.
Engineers do what is possible and most economical, science not being their first priority. Yet, historically, engineers have produced eidolons of considerable interest and importance. We simply mention a few—there are many—of their achievements. Early television engineering was much about bandwidth. Otto Schade (
1948) pioneered the MTF, enabling him to create eidolons that were acceptable to the public for many years. The development of color television yields a similar story. Eidolons were based on opponent channels, with the higher bandwidth devoted to the luminance channel. When digital images became common, the aforementioned JPEG eidolons became of major importance. They are based on rather intricate properties of spatial vision.
Of course, it is much more interesting to consider eidolons that are somehow constrained by scientific understanding of either the neurophysiology or the phenomenology of visual awareness (Stojanoski & Cusack,
2014). A well-known current model of eidolons is due to the work at Eero Simoncelli's lab at New York University (Portilla & Simoncelli,
2000). Their idea is to impose empirical knowledge of the structural complexity of neural activity in V1 as a constraint on the structure of the eidolons. Thus, an eidolon is an image that would ideally evoke the statistically equivalent neural activity as the fiducial image does. This is an important notion, characterizing the bottleneck imposed by V1 (or anything up to V1), very similar to—but hugely more complicated than—the notion of metamerism in colorimetry.
Rather than just analyzing the fiducial image first and then trying random images until you hit on one that has the equivalent statistics, the actual implementation forces an initial noise image into complying with the descriptor values established in the analysis stage. Otherwise, it would be like waiting for the proverbial monkey randomly hitting the keys of a typewriter to finish a perfect transcript of Helmholtz's
Handbook of Physiological Optics. It is a sculpting of essentially random structures to be as equivalent to the fiducial statistics as can be. We describe these methods in
Appendix C.
The Portilla-Simoncelli algorithm produces the “mongrels” that have been used in the highly interesting research in Ruth Rosenholtz's lab at Massachusetts Institute of Technology (Rosenholtz,
2011; Rosenholtz, Huang, & Ehinger,
2012; Rosenholtz, Huang, Raj, Balas, & Ilie,
2012). They have pioneered the use of mongrels as a novel and powerful tool in vision research. That is exactly the intended use of the eidolons proposed here.
The eidolons are based on the phenomenology of vision rather than neurophysiology. However, on the formal level there are obvious tangencies. Our inspiration came also from the study of painting methods. Visual artists, throughout the centuries, have been involved in the production of eidolons. On the whole they have been very successful, their clients sometimes taking their eidolons for reality. However, they explored the territory thoroughly and reached the dark boundary regions where the eidolons fall apart, perhaps taxing the visual competence of some observers but leaving the majority of their public behind. We find that technical painting methods are closely related to the phenomenology of vision as studied academically (e.g., Cateura,
1995; Jacobs,
1986). Our eidolon factory is based on that.
The eidolon factory described here (technically in
Appendix B; a demonstration program is available—see
Appendix A) offers some desirable features for vision research:
It is formally simple and transparent. It is essentially just the mathematician's toolbox of differential geometry (Bell,
2005; Koenderink,
1990; Koenderink & van Doorn,
1992; Spivak,
1999; ter Haar Romeny,
2008).
It is overall linear except for places where essential nonlinearities come in a transparent manner.
It is algorithmically simple and transparent. No magical numbers. No iterative procedures (
Appendix C). No partial differential equations to solve (Elder & Zucker,
1998). All that happens is the accumulation of mass activity—our synthesis stage.
It is a nice summary account of what the cortex might be computing. Such summary accounts (actually caricatures, of course) might be more useful than an exhaustive description because they appeal to the intuition. Phonebooks are useful but hardly appeal to the understanding.
It is a powerful heuristic in that it is easily expandable. There are only a limited number of crucial elements to be understood.
Eidolons can be obtained in a straightforward manner; only a few (intuitively obvious) parameters need to be set.
This may sound like eidolon factories are just for squares—there being no surprises or challenges for the cool kids! But that would be too limited a picture. Being able to actually understand what is happening is really a source of freedom. The parameters at your disposal are meaningful, and their actions are largely independent of each other. So, you have an interface to the factory that is transparent. There will be no surprises once you understand the basic (simple) structure. This makes it possible to aim your investigation of visual awareness much more precisely. It puts you in control as a scientific investigator. When probing nature (including the human mind!), the surprises should be due to nature rather than the probing tool.
Because it is so simple and direct, the eidolon factory is very easily extended in various directions. For instance, one may apply disarray just as well to opponent color channels as to the orientation of edgelets and so forth. Disarray is also easy to apply in a spatially nonuniform way, opening up many directions of research. A simple example of such focused disarray is shown in
Figure 23.
In conclusion, we have proposed an eidolon factory that is quite open ended in its potential applications and capable of almost endless development. It is also simple enough that one may adapt it for any specific application. Because of its simplicity, it allows one to tailor the nature of eidolons to specific problems.
KG and MV were supported by the Deutsche Forschungsgemeinschaft DFG SFB/TRR 135. MV was supported by the EU Marie Curie Initial Training Network “PRISM” (FP7—PEOPLE-2012-ITN; grant agreement 316746). JW, JK, and AvD were supported by the program by the Flemish Government (METH/14/02), awarded to JW. JK was supported by a Humboldt-Award by the Alexander-von-Humboldt-Foundation.
Commercial relationships: none.
Corresponding author: Karl R. Gegenfurtner.
Email: Karl.R.Gegenfurtner@psychol.uni-giessen.de.
Address: Justus-Liebig Universität Giessen, Abteilung Allgemeine Psychologie, Giessen, Germany.
We prepared a demo application (
for PC;
for Mac) that allows one to produce parameterized eidolons of various types for given images and save the results. The application also lets you conveniently look at the data structures that play decisive roles behind the screen. Thus, it shows more than is strictly necessary for eidolon generation. On the other hand, it is by no means a universal eidolon factory: There are things you might want to try that the demo doesn't allow. The reason is simply that we needed to keep the interface complexity—already complex—within reasonable bounds. We concentrated on instances that might prove of immediate interest to vision research.
Because the demo required an extensive interface, we economized on its capabilities. It lets you process only monochrome images that are 512 pixels square. Anything else will be converted to that format. The eidolon machine synthesizes the image on the basis of a scale-space representation of edgelets that are boundary representations that (different from the edginess obtained from edge detectors) retain the polarity along the boundary.
The demo will run on most platforms because it was written in Processing. For the applications, we packaged the java environment with the code so we can be certain that they will run properly regardless of what you may have installed on your machine.
The demo has an extensive interface that is convenient but perhaps takes getting used to. Most of the ins and outs are explained in the help that is always available under the “H” (or “h”) key.
We find ourselves somewhat in a quandary about how we should describe the algorithmic aspects of the eidolon factory. Because proprietary aspects should not figure in a scientific journal, we cannot use a high-level formal language such as Mathematica (which would be most appropriate and clear from a formal perspective) or an environment such as Matlab (which might appeal more to engineering minds). Differences are substantial—for instance, one line of Mathematica may correspond to a thousand lines of C. Obviously, the former is easier to read than the latter. This is due to the hiding of details that are conceptually irrelevant.
Our implementation is in Processing, which is an open source Java-derived environment that was especially aimed at creative minds. Modern artists and designers use it extensively. We use it all the time in our vision research because it saves us so much time and effort, but we notice that few others in our field even know about it. Java runs on all platforms and is a well-designed object-oriented language. Processing might be “Java without tears” (for suckers), but it has retained these advantages. Almost any other language you might happen to be familiar with would serve just fine. Simply implement our pseudocode in your favorite language. This might (if you are at all familiar with your language) involve a few hours at most (as we actually checked!).
Unfortunately (or not; it depends on your perspective), there is no such a thing as a pseudocode standard. So we roll our own. It is perhaps something like formal pseudocode (i.e., not Fortran, Pascal, C, Basic, and so on style pseudocode). So you will see things like the following.
Notice that ignoring what is in between the COMMENT and END COMMENT braces is not going to hurt you. It will have no consequences on the eventual algorithmic implementation. The idea is that the notation is self-documenting, so perhaps occasionally taking notice of comments might be a good idea, as they may have been inserted for some reason. But, in principle, COMMENT means that the lines until END COMMENT can be safely skipped. On the other hand, the DO part “increment counter” is crucial. Failing to increment the counter—whatever that may mean—is surely going to hurt you. A lead to what it might mean can often be gleaned from the context or comments. DO is followed by something that has to be done. Other comments come also as pairs of braces, such as
and so forth. We use indentation to highlight the inclusion structure.
is a function that encapsulates a computation. Notice that the parameter section might be empty, like for a function COIN_FLIP() that returns head or tail. A program is a collection of functions. One of these is supposed to deliver the final result.
Of course, there are numerous decisions in implementations that certainly may make a difference but can hardly be counted to be our business. Here is a simple example. Images are represented in numerous ways in computer memories. But how the bit planes are ordered, and so forth, is not our business. A monochrome point (“pixel”) can be represented by a byte, integer, float, double, and so forth. This is not our business either, but it makes a difference. We will need Fourier-based methods. Whether one uses the latest fast Fourier transform (FFT; Heideman, Johnson, & Burrus,
1985) implementation, the old-fashioned Filon integration (Abramowitz & Stegun,
1972), or something rolled oneself is not our business. But, again, choices often make a difference! Often in computation time, sometimes in precision. They may have distinct limits of applicability. As said before, such technicalities will be skipped here.
2. The basic deterministic structures
The simplest representation is as a number of progressively blurred images at discrete scale levels separated by factors of two (coarse) or square root of two (almost always good enough). The highly blurred images may be subsampled spatially without significant loss, but this is usually inconvenient and memory is cheap.
For a 512 × 512 image (for example), the range of scales would run from 1 (pixel size) to about 128 (one quarter of the image size). With a square root of two factor, that implies more than a dozen levels.
Building scale space implies
One typically uses FFT methods to do this (the demo uses JTransforms 2015), but Mathematica enables you to simply say “blur the image by so much,” which captures the conceptual content in a direct way. You may want to do some additional housekeeping here—for instance, handle boundary effects in some preferred way, subsample the highly blurred layers, and so forth. We put a few hints to such issues in the comments.
Building the difference scale space implies
Notice that after constructing the difference scale space the scale space itself can be deleted because it can be regained from the difference scale layers.
One catch to be aware of is the DC level because DOG receptive fields are not sensitive to that. In practice, adding a constant suffices, so this is not a problem. You simply retain the coarsest scale-space layer. Thus:
The basic ingredient is the Gaussian noise image:
The eidolon factory requires a great many of such images, all mutually independent. For instance, a displacement vector field requires two:
It is a simple matter to impose disarray on a given image:
Notice that the “image” here will usually be a difference scale-space layer. You will perturb many such layers before combining them in the final synthesis. Notice also that there are many additional uses for noise fields. For instance, instead of or in addition to spatial disarray, you may want to perturb the gain, orientation, and so forth of a receptive field. Although we do not consider this in this article, here lies an important field of enquiry.
This is how you construct fractal disarray:
It is easy enough to implement such eidolons. This is how (essentially just the regular CAPTCHA or XKCD-emulation method):
Display Formula
Indeed, nothing more complicated than that. The resulting image is both blurred and locally scrambled. Such eidolons are extremely simple to generate yet are already an interesting class for vision research—in fact, most likely a good starting platform in many cases.
Differential geometry: Local geometry. It is defined by being applied to regions of interest that have the same size as the operators (e.g., edge detectors).
Edgelet: Local component of an edge. In differential geometry, edges are considered as a string of spatially contiguous and aligned edgelets.
Eidolon: Class of stimuli that are equivalent to a given fiducial stimulus along a given perceptual continuum. Stimuli that are metameric are eidolons too, but the definition extends to stimuli that are perceptually equivalent along a given dimension while still being distinguishable in other aspects. Notice that equivalence is defined in a phenomenological sense, and consequently it is subjective in nature.
Eidolon factory: Algorithm that can be used to modify images. Its parameterization defines the physical space in which perceptual equivalence can be established through psychophysical methods. Many eidolon factories are possible beyond the one we introduce in this work.
Local sign: Positional signature (German Lokalzeichen). Psychophysical bridge between neural representation and awareness of position.
Metamer: Class of stimuli that are perceptually indistinguishable under some specific viewing condition.
Modulation transfer function (MTF): Being v a given spatial frequency of a grating stimulus, C0 the physical contrast of a stimulus, and Ci the transferred contrast (i.e., the contrast resulting after the stimulus is transferred through an optical device or the effective contrast in a visual system), MTF(v) = Ci/C0.
Psychogenesis: Process by means of which a mental state comes to be. In the present study, we are primarily referring to the process leading to visual awareness when human observers view an image.
Tarachopia: Scrambled vision. Concept proposed by Hess (
1982) to characterize the phenomenology of amblyopia as well as the observation that amblyopia affects pattern discrimination to a larger extent than simple visual detection.
Translation invariance: Indicates that the measurement of a property is independent of the location at which the measurement takes place. Specifically, in the case of Fourier transform, it indicates that the amplitude spectrum is identical if the image is shifted.