Abstract
Based on expertise and innate ability, subjects may differ greatly in visual performance. Experts and novices, when presented with the same visual stimuli, organize and process them differently. Here we define experts as systems that categorize stimuli at a subordinate level, and novices as systems that categorize the same stimuli at a coarser grain. We model both expert and novice as deep convolutional neural networks. After the model is trained, we highlighted the regions in the images relevant for predicting the target. These attention maps for the target classes were obtained by using the gradients of target flowing into the final convolutional layer to produce a coarse localization map. We later trained a new novice model by adding attention map obtained from the expert to the input image of the new novice model. We report two main results: (1) The attention map used by the expert network has higher entropy, and smaller, local features than the novice networks (suggesting that the expert looks at multiple locations to make a classification decision); and 2) the attention map used by the expert can be used to train the novice, resulting in faster training and better performance. We consider this to be analogous to the expert telling the novice where to look for discriminative regions of the image.