Target-object guidance is believed to result when image locations have features similar to those of the target-object representation (
Zelinsky et al., 2020a). Target guidance is therefore strongest on target-present (TP) trials where a target actually appears in the image, but a similarly computed, albeit weaker target guidance exists in TA search. To study how target guidance in a search task compares to center bias, saliency, and object recognition uncertainty, we need a method for obtaining a target map that reflects a bias for target features in a visual input. As already reviewed, there are many methods for doing this, but in the interest of keeping the state representations as comparable as possible in our model comparison we used the same MaskRCNN object proposal method (
He et al., 2017) that we used to obtain an object uncertainty map. However, different thresholds on confidence were used depending on whether the search was target present or target absent. For TP search, we obtained the MaskRCNN object proposal bounding box in the image that had a confidence score greater than 0.9 that the object was an exemplar of the target category. We chose this high confidence threshold to ensure that the target was the only object selected in the scene, which was true most of the time. Moreover, the intersection over the union of this bounding box with the ground truth target-object labels from COCO-Search18 was 0.826, thereby validating our use of the MaskRCNN method. We then obtained a target map (Target) by applying a 2D Gaussian (
σ = one-fourth of the box height,
hb, as done for the center bias map, and size = image height,
him, resized to the box dimensions) at the center of this bounding box. In the case of TA search, we simply lowered the level of confidence for the MaskRCNN to 0.02, which was necessary because the confidence of a non-target object being the target is usually much lower compared with the confidence of actual target objects. A target map was then obtained similar to TP search. Specifically, we applied the same 2D Gaussian used for the TP search at the center of every bounding box (with recognition confidence value
> 0.02) to obtain the target map, again assuming that there are some features at the bounding box locations that are guiding attention in proportion to their target similarity. Note that, whereas more sophisticated methods have been developed for predicting search fixations (
Yang et al., 2020;
Zelinsky et al., 2020a), we thought it best to err on the side of interpretability when selecting a method for obtaining a target map, which is often a problem for more sophisticated deep-learning methods. Our implementation of a target map is a simple bias much like a center bias, only the bias is introduced at the detected target locations. Given our goal of weighting the contributions of different features in a comparison, and not best predicting fixation locations, we believe this interpretability of the MaskRCNN method is a strength.