Abstract
Real-world scenes can be quickly categorized (bedroom, forest, etc.). This “gist” is based on global information, integrated across the entire scene. However, scenes can have components consistent with two or more gists; for example, a bedroom with a forest view outside a large window. We call such scenes “chimeric.” What would be the gist of such a chimeric scene: “bedroom”, “forest”, both, or neither? In Experiment 1, 15 observers indicated whether or not a pre-cued scene category appeared in a rapid serial visual presentation (RSVP) stream. 50 Hz RSVP streams contained 5 Simoncelli masks with the scene in the second position. Six scene categories, three indoor (bedroom, bathroom, office) and three outdoors (forest, beach, desert), were shown in three types of trials (regular, standard-chimeric, non-standard-chimeric). Regular images contained information consistent with a single category. Chimeric images contained information from two categories. Standard-chimeras were scenes that might normally occur (e.g., kitchen with a beach view). Non-standard-chimeras were unusual but still real scenes (e.g., a bed placed in a forest). Typicality of chimeras was based on ratings from 50 observers on Mechanical Turk. Cued category and trial type varied randomly, intermingled with filler trials. d’ for chimeras was barely above chance (0.19), significantly worse than for regular scenes (d’=0.93). Standard and non-standard chimeras did not differ. In Experiment 2, 15 observers made 4AFCs following RSVP presentation of chimeric scenes (e.g., pick Bedroom, Forest, Both, or Neither). Correct responses (29%) were barely, but significantly above the 25% chance level for pooled data. Chi-square tests were significant for only three of 16 individuals, taken separately. While humans have an impressive ability to extract the gist of a scene in a fraction of a second, that ability may be fragile, failing dramatically in chimeric scenes.
Meeting abstract presented at VSS 2015