Abstract
“The Model” (a.k.a. “TM”, Dailey and Cottrell, 1999) is a biologically-plausible neurocomputational model designed for face and object recognition. Developed over the last 25 years, TM has been successfully used to model many cognitive phenomena, such as facial expression perception (Dailey et al., 2002), recruitment of the FFA for other categories of expertise (Tong et al., 2008), and the experience moderation effect on the correlation between face and object recognition (Wang et al., 2014). However, as TM is a “shallow” model, it cannot develop rich feature representations needed for challenging computer vision tasks. Meanwhile, the recent deep convolutional neural network techniques produce state-of-the-art results for many computer vision benchmarks, but they have not been used in cognitive modeling. The deep architecture allows the network to develop rich high level features, which generalize really well to other novel visual tasks. However, the deep learning models use a fully supervised training approach, which seems implausible for early visual system. Here, “The Deep Model” (TDM) tries to bridge TM and deep learning models together to create a “gradually” supervised deep architecture which can be both biologically-plausible and perform well on computer vision tasks. We show that, by using the sparse PCA and RICA algorithms on natural image datasets, we can obtain center surround color-opponent receptive field that represent LGN cells, and Gabor-like filters that represent V1 simple cells. This suggests that the unsupervised learning approach is what is used in the development of the early visual system. We employ this insight to develop a gradually supervised deep neural network and test it on some standard computer vision and cognitive modeling tasks.
Meeting abstract presented at VSS 2015