Abstract
The pattern of eye movements made during reading is one of the most stereotyped high-level behaviors made by humans. As the readers of a language, we all agree to move our eyes in about the same way. Yet might there be hidden within all this self-similar behavior subtle clues as to how a reader is engaging the material being read? Here we attempt to decode a reader’s eye movements to reveal their level of text comprehension and their mental state, with the states considered being the dimensions of: interested/bored, awake/sleepy, anxious/relaxed, and confused/clear. Eye movements were recorded from participants reading five published SAT passages. After each the corresponding SAT question was asked, followed by a questionnaire about the reader’s mental state while reading the passage. A sequence of fixation location (x,y), fixation duration, and pupil size features were extracted from the reading behavior and input to a deep neural network (CNN/RNN), which was used to predict the reader’s comprehension (e.g., accurate or inaccurate answer) and their questionnaire scores. Specifically, classifiers were trained on labeled reading data (80%) for a given passage, then evaluated in their ability to predict scores from the unlabeled reading eye movements (20%). We also compared our model to an SVM model that used hand-coded features, such as average forward speed and angularity of eye movement, to predict the same comprehension and questionnaire scores. We found that our models successfully predicted most of the scores, in some cases highly above chance (classification accuracy >80%), and that the deep network model generally outperformed the SVM model. By learning and using features that code the seriality and nonlinearity of fixations made during reading, we conclude that a CNN/RNN can decode reading behavior to reveal a reader’s level of comprehension and mental state.