Glossary
Backprop: Short for “backward propagation of errors,” it is widely used to apply gradient-descent learning to multilayer networks. It uses the chain rule from calculus to iteratively compute the gradient of the cost function for each layer.
Convexity: A real-valued function is called “convex” if the line segment between any two points on the graph of the function lies on or above the graph (Boyd & Vandenberghe, 2004). A problem is convex if its cost function is convex. Convexity guarantees that gradient descent will always find the global minimum.
Convolutional neural network (ConvNet): Rooted in the Neocognitron (Fukushima, 1980) and inspired by the simple and complex cells described by Hubel and Wiesel (1962), ConvNets apply backprop learning to multilayer neural networks based on convolution and pooling (LeCun et al., 1989; LeCun, Bottou, Bengio, & Haffner, 1998).
Cost function: A function that assigns a real number representing cost to a candidate solution by measuring the difference between the solution and the desired output. Solving by optimization means minimizing cost.
Cross-validation: Assesses the ability of the network to generalize from the data that it trained on to new data.
Deep learning: A successful and popular version of machine learning that uses backprop neural networks with multiple hidden layers. The 2012 success of AlexNet, then the best machine learning network for object recognition, was the tipping point. Deep learning is now ubiquitous in the Internet. The idea is to have each layer of processing perform successively more complex computations on the data to give the full multilayer network more expressive power. The drawback is that it is much harder to train multilayer networks (Goodfellow et al., 2016).
Generalization: How well a classifier performs on new, unseen examples that it did not see during training.
Gradient descent: An algorithm that minimizes cost by incrementally changing the parameters in the direction of steepest descent of the cost function.
Hebbian learning: According to Hebb's rule, the efficiency of a synapse increases after correlated pre- and post-synaptic activity. In other words, neurons that fire together, wire together (Lowel & Singer, 1992). Also known as spike-timing-dependent plasticity (Caporale & Dan, 2008).
Machine learning: Any computer algorithm that learns how to perform a task directly from examples, without a human providing explicit instructions or rules for how to do so. In one type of machine learning, called “supervised learning,” correctly labeled examples are provided to the learning algorithm, which is then “trained” (i.e., its parameters are adjusted) to perform the task correctly on its own and generalize to unseen examples.
Neural nets: Computing systems inspired by biological neural networks that consist of individual neurons learning their connections with other neurons in order to solve tasks by considering examples.
Supervised learning: Any algorithm that accepts a set of labeled stimuli—a training set—and returns a classifier that can label stimuli similar to those in the training set.
Support vector machine (SVM): A type of machine learning algorithm for classification. An SVM uses the “kernel trick” to quickly learn to perform a nonlinear classification by finding a boundary in multidimensional space that separates different classes and maximizes the distance of class exemplars to the boundary (Cortes & Vapnik, 1995).
Unsupervised learning: Discovers structure and redundancy in data without labels. It is less widely used by computer scientists than supervised learning, but of great interest because labeled data are scarce while unlabeled data are plentiful.