Abstract
In letter identification, confusion probabilities are often used to define letter similarity. The goal of this study is to develop a universal, objective method to quantify letter similarity. Since letter similarity may be associated with the number of similar features shared by the letter pair, we started with identifying 22 potential letter features that appear universally across everyday fonts. A san-serif font, Century Gothic, was used for letter feature analysis. Only lowercase letters were considered. Customized Matlab scripts were used to process letter images and to detect and quantify letter features for each letter, following which similarity score between each pair of letters was calculated. There were 22 sets of similarity scores, one for each feature. To assess the amount of variation in letter confusion probability accounted for by the similarity scores, we performed linear regression analyses (considering only the linear terms). Four confusion matrices measured with different fonts under different testing conditions (Bouma, 1971; Geyer, 1977; McGraw et al., 1994) were examined. Despite the usage of different fonts in calculating similarity scores and obtaining confusion matrices, the best fitting models explained about 30% of the variability of letter confusion across the four confusion matrices. Ten letter features (e.g., number of separated parts, parallelism, and closed area) consistently showed significant contributions to the errors in letter identification. These findings suggested that defining letter similarity in a universal, objective way is feasible.
Meeting abstract presented at VSS 2017