Abstract
People can rapidly perform simple visual tasks, such as object detection or categorization, suggesting the sufficiency of feed-forward visual processing for these tasks. However, more complex tasks, such as precise localization may require high-resolution information available at early areas in the visual hierarchy. Top-down feedback processing that traverses several stages in the visual hierarchy allows access to this information, but additional processing time is needed for this traversal (Tsotsos et al., 2008). Motivated by this, we hypothesized that a localization task which requires precise location information represented in early visual areas would need longer processing time than a simple categorization task. Each participant performed both categorization (animal detection) and feature localization tasks. We constrained stimulus presentation durations and compared processing time needed to perform each task. Performance would be asymptotic at shorter presentation duration if feed-forward processing suffices for a task, whereas performance would gradually improve as duration increases if the task employs feedback processing. In Experiment 1 where simple images were presented, both categorization and localization performance sharply improved until 100 ms then it leveled off. Feature localization mirrored the previously reported rapid categorization but did not support the involvement of feedback processing, indicating that the task could be performed based on coarse location information obtained via feed-forward processing. In Experiment 2, more attention-demanding and ecologically valid images were used as stimuli. We observed that categorization performance again plateaued after 100 ms as in Experiment 1. However, localization precision gradually improved as presentation duration increased as we hypothesized. A piecewise linear model explained these data better than a simple linear model, suggesting that both feed-forward and feedback processing contributed to localization but at different temporal ranges. To conclude, feedback processing is necessary for a visual task that requires high-resolution representation, including precise localization in conflicting context.
Acknowledgement: JKT: Air Force Office of Scientific Research (FA9550-18-1-0054), the Canada Research Chairs Program (950-219525), and the Natural Sciences and Engineering Research Council of Canada (RGPIN-2016-05352), MF: the Natural Sciences and Engineering Research Council of Canada Discovery Grants (RGPIN-2016-05296) and the Canadian Foundation for Innovation Leaders Opportunity Fund.