Abstract
Berlin and Kay (1969) proposed that the naming of color within all languages evolves in a hierarchically constrained way toward an end-state resembling English. We evaluated this proposal by examining the color naming data in Kay's World Color Survey (WCS; http://www.icsi.berkeley.edu/wcs/data.html). We used a k-means classifier to partition the 14336 distinct chromatic color naming patterns in the WCS, each obtained when an informant assigned a single color name to a set of WCS color chips. Classification was based on similarities in pair-wise Pearson correlation between binary representations of the color naming patterns. In our first analysis, we varied the number of categories, K, into which our classifier partitioned the naming patterns. When we varied K between 2 and 10, we found: 1) average color names for all values of K glossed to single or composite English names; 2) the structures of the k-means partitions unfolded in a way reminiscent of Berlin and Kay's hierarchical sequence of color category evolution (the two-color solution gave WARM and COOL; the three-color solution gave RED-or-PINK, YELLOW-or-ORANGE, and COOL; etc.). Gap statistical analysis (Hastie & Walther, 2001) revealed K=8 statistically significantly distinct partitions in the WCS data set: RED, GREEN, YELLOW-or-ORANGE, BLUE, PURPLE, BROWN, PINK, and GRUE. The “early” color names, WARM and COOL for instance, were no longer distinct color naming categories when K=8. This variation in the color naming category structure as K varies may help explain why different scholars have come to different conclusions regarding the existence of evolutionary processes in color naming.