Abstract
The performance of artificial intelligence (AI) has reached expert levels for several medical image screening tasks. Nevertheless, in clinical settings, the combination of AI and expert radiologists often fails to produce better outcomes than either working alone. We gave non-expert observers simulated AI assistance to search for target textures in arrays of eight color textures located on a virtual circle. Target colors were drawn from a distribution that differed by 2.2 standard deviations from the distractor distribution. Four conditions were tested: 1) “On-demand AI” gave a probability that specific items were targets only when asked by the observer. 2) “Image triage” used a liberal criterion to filter the images that were shown to the observer. If AI was “sure” that an item was not a target, it was not shown, reducing the set size. If AI eliminated all items, the entire trial was not shown. 3) “Both”: First, images were triaged. Observers could then query remaining stimuli for the probabilities that they were targets. 4) “Control” condition had no simulated AI. Prevalence was 50% or 10% in different blocks. Results: Triage improved performance. On-demand AI did not. Prevalence did not change this pattern, though miss errors were elevated at low prevalence. In Experiment 2, On-demand was replaced by a “Second Reader” AI that gave its opinion after humans’ response and allowed humans to change their previous response. Again, Triage helped. Second Reader AI did not. Unlike Wolfe and Nartker’s (VSS2019) results, using Both (AI for triage and second reader) was not better than Triage alone, perhaps because the Second Reader was not effective in this paradigm. Similar results were obtained at low prevalence. In search tasks, Image Triage, where AI saves time by eliminating some stimuli, may be more promising than methods where AI tries to offer positive advice.