In computational neuroscience, there is a tradition to exploit recent developments in statistical modeling and algorithms to develop more sophisticated models of visual processing (
Simoncelli & Olshausen, 2001). In this respect, recent developments in artificial intelligence (AI), and in particular deep learning (DL), offered significant contributions in the last decade (
Hassabis et al., 2017), showing promising results in the attempt to improve our understanding of visual counterstream computations (
Kietzmann et al., 2019;
Qiao et al., 2019). In the DL field, and more broadly in AI, a textbook categorization in the literature is the distinction between supervised and unsupervised learning; while the first learns statistical representations based on labelled datasets, including tasks such as classification, regression, and segmentation, the latter tries to extract features and inherent patterns from unlabelled data, with tasks such as clustering, anomaly detection, and dimensionality reduction (
Hastie et al., 2009). Whereas supervised learning usually allows to obtain good performance in a predefined task, being able to make sense of the large amount of unlabelled data often available is a promising and exciting goal for the field. In this respect, the recent trend of self-supervised learning tries to use the data as a supervision strategy (
Jing & Tian, 2020); the idea is to challenge the model to solve a specific unrelated task to learn representations of the data that can be later applied to solve supervised tasks, such as object classification or to automatically label the dataset. Examples include learning to predict some part of the image from other parts, predicting relative locations of two image patches, solving a jigsaw puzzle, and colorizing an image. Originally designed as unsupervised methods, in generative adversarial networks (
Goodfellow et al., 2014), have proven successful in supervised and reinforcement learning tasks; in generative adversarial networks, a discriminator is trained by evaluating whether the data created by a generator is part of the training data (i.e., is a real image) or not (fake). Despite their great promise in understanding data, unsupervised learning methods are not often used for model comparisons with brain data.