Early attempts to model the extremely high levels of positional sensitivity shown by human observers explored the possibility of using a distributed set of orientation- and frequency-selective filters (Carlson & Klopfenstein,
1985; Klein & Levi,
1985; Sullivan, Oatley, & Sutherland,
1972; Wilson & Gelb,
1984). Although this general class of filter distribution model varies in their implementation, a common assumption is that the spatial relationship between objects is encoded by a filter large enough to encompass both simultaneously. The advantage of such schemes is that the relative position is encoded without the need for information about the receptive field position. However, the requirement for filters large enough to span both objects has meant that this approach essentially fails for separations larger than about a degree. Moreover, this class of model cannot account for several other aspects of behavioral data such as the influence of flanking targets or contrast polarity effects (Burbeck,
1987; Morgan & Ward,
1985). These problems led to the development of hybrid filter-local sign models, where a filter-based approach provides an initial representation and a second stage of analysis computes the distance between features (e.g., Burbeck,
1987). Here, local sign information provides retinotopic information (Burbeck,
1987) or guides the intelligent selection of filters (Morgan, Hole et al.,
1990; Morgan, Ward et al.,
1990). More recent population coding models of visual space have adopted an implementation where stimulus locations are represented intrinsically rather than via an external coordinate frame of reference, such as a retinotopic map (Lehky & Sereno,
2011). An intrinsic coding scheme, where only relative position is important, has the advantage that visual representations are largely invariant of changes in scale or viewing position, transformations that the visual system must commonly accommodate. However, purely intrinsic schemes present a considerable challenge for the visual guidance of motor commands, where eye or hand movements must be directed toward specific physical locations. Intrinsic spatial coding models predict that populations with only relatively small receptive fields available, such as those typically found in striate cortex (V1), result in markedly distorted spatial representations (Lehky & Sereno,
2011). While most of the data generated from intrinsic coding models have been derived from single stimulus presentations, some model predictions have been produced for situations in which two stimuli are presented either one after the other or simultaneously in visual space. Interestingly, the types of distortion that result from simultaneous presentation are qualitatively in agreement with previous psychophysical observations (e.g., Badcock & Westheimer,
1985). Whether intrinsic coding schemes can be adapted to predict the size-related distortions of visual space we report here remains a future computational challenge.