To evaluate the importance and contribution of PLW features, we constructed two classification models with different targets: biological sex (male = 1, female = 0) and perceptual sex (male = 1, female = 0). Reducing the number of features is important in machine learning because many features may not produce a desired outcome by overfitting the learning algorithm to noise (
Brink, Richards, & Fetherolf, 2016). In the present study, we performed feature selection using forward and backward sequential methods and exhaustive search and feature transformation using PCA. The optimal machine learning algorithm was determined from the performance of six algorithms: discriminant analysis,
k-nearest neighbors, naive Bayes, random forest, Gaussian kernel support vector machine (kernel SVM), and ensemble algorithms. In
k-nearest neighbors, there is always a trade-off in setting the value of
k. When
k is low (
k = 1), the algorithm becomes sensitive to noise in the data, resulting in overfitting. Conversely, when
k is high, the algorithm loses the true pattern of the data, resulting in underfitting. In the present study, we tried several numbers and set
k to 5. We counterbalanced the problem of setting
k to a high value by using a distance-weighted
k-nearest neighbor approach, which functions as the inverse distance between the neighbor and the query (
Kelleher, Mac Namee, & D'arcy, 2020). To ensure that all features were considered equally, we normalized the data to be in the range of 0 to 1 (
Theodoridis, Pikrakis, Koutroumbas, & Cavouras, 2010). We performed machine learning modeling using a MATLAB-based library (Statistics and Machine Learning Toolbox 2021a; The MathWorks, Inc., Natick, MA, USA).