Abstract
Background: The utility of the random forest algorithm was investigated as a computational framework for extracting the most relevant features from an EEG data set obtained from mild Traumatic Brain Injury (mTBI) patients and Healthy Controls (HC) during a Visuospatial Working Memory (VSWM) Task. Methods: The data used was obtained from an OpenNeuro repository (RRID: ds003523), but the subjects that conformed the final groups, mTBI (n = 27) and HC (n = 27), were matched using demographic variables and their performance in the VSWM task, thus ensuring that both groups did not differ significantly by age (p = 0.67), sex (p = 0.58), nor by hit ratio (p = 0.97). The EEG epoched signals covering three memory phases were analyzed to extract 5 frequency components from each scalp site. The EEG data was labelled by group and separated as either correct or incorrect. The random forest algorithm was trained with 60% of the data to build EEG classifiers of VSWM trial accuracy and diagnosis. Results: Analyses of the performance in the VSWM task by stimulus type and age group yielded tenuous differences, evidencing the need of a more reliable method to distinguish both groups. The first model correctly classified trial accuracy at 85% in HC; occipital beta at baseline provided one of the highest importance values as well as parietal theta at baseline and parietal delta at encoding. The second model correctly classified trial accuracy at 78% when cross-validated in mTBI data and included theta and beta bands of several channels. Model 3 using correct trials only, identified central-parietal beta at retention and posterior occipital gamma and beta at encoding as primary classifiers of diagnosis, providing a 98% classification accuracy. Model 4 using the incorrect trials only, identified Central gamma at retention and baseline as primary classifiers of group belongingness.