Abstract
We investigated how well people discriminate between different statistical structures in letter sequences. Specifically, we asked to what extent do people rely on feature-based aspects vs. lower-level statistics of the input when it was generated by simple or by more hierarchical processes. Using two symbols, we generated twelve-element sequences according to one of three different generative processes: a biased coin toss, a two-state Markov process, and a hierarchical Markov process, in which the states of the higher order model determine the parameters of the lower order model. Subjects performed sequence discrimination in a 2-AFC task. In each test trial they had to decide whether two sequences originated from the same process or from different ones. We analyzed stimulus properties of the three sets of strings and trained a machine learning algorithm to discriminate between the stimulus classes based either on the identity of the elements in the strings or by a feature vector derived for each string, which used 13 of the most common features split evenly between summary statistics (mean, variance, etc.) and feature-based descriptors (repetitions, alternations). The learning algorithm and subjects were trained and tested on the same sequences to identify the most significant features used by the machine and humans, and to compare the two rankings. There was a significant agreement between the ranks of features for machine and humans. Both used a mixture of feature-based and statistical descriptors. The two most important features for humans were ratio between relative frequencies of symbols and existence of repeating triples. Repetitions of length three or higher were consistently ranked higher than alternations of the same length. We found that, without further help, humans did not take into account the complexity of the generative processes. Acknowledgements: Marie-Curie CIG 618918, NIH EY019889
Meeting abstract presented at VSS 2015