Abstract
The goal of this research project is to create a model the human visual system with anatomical and experiential constraints. The anatomical constraints implemented in the model so far include a foveated retina, the log-polar transform between the retina and V1, and the bifurcation between central and peripheral pathways in the visual system. The experiential constraint consists of a realistic training set that models human visual experience. The dataset most often used for training deep networks is ImageNet, a highly unrealistic dataset of 1.2M images of 1,000 categories. The categories are a rather Borgian set, including (among more common ones) ‘abacus’, ‘lens cap’, ‘whiptail lizard’, ‘ptarmigan’, ‘abaya’, ‘viaduct’, ‘maypole’, ‘monastery’, and 120 dog breeds. Any network trained on these categories becomes a dog expert, which is only true of a small subset of the human population. The goal of the “Day in the Life” project is to collect a more realistic dataset of what humans observe and fixate upon in daily life. Through the use of a wearable eye-tracker with an Intel Realsense scene camera that gives depth information, we are recording data from subjects as they go about their day. We then use a deep network to segment and label the objects that are fixated. The goal is to develop a training set that is faithful to the distribution of what individuals actually look at in terms of frequency, dwell time, and distance. Training a visual system model with this data should result in representations that more closely mimic those developed in visual cortex. This data should also be useful in vision science, as frequency, probably the most important variable in psycholinguistics, has not typically been manipulated in human visual processing experiments for lack of norms. Here we report some initial results from this project.
Acknowledgement: NSF grant SMA-1640681