Visual masking refers to such a phenomenon that a briefly presented stimulus (target) that is clearly visible when shown alone is rendered less visible or invisible by another stimulus (mask) with spatiotemporal proximity. Masking is not only a powerful tool to regulate the visibility of a target, but also itself is a useful way for studying basic visual processes (B. Breitmeyer & Öğmen,
2006; see review by Ansorge, Francis, Herzog, & Öğmen,
2007). Yet consensus regarding the underlying causes of masking is lacking despite decades of research. Previous studies have focused mainly on spatiotemporal aspects of masking, suggesting that the effect of masking depends critically on spatial distance and the time interval between target and mask (e.g., B. Breitmeyer & Öğmen,
2006; Polat, Sterkin, & Yehezkel,
2007). In the recent two decades, object substitution masking (OSM), which is a special type of masking, has been reported to disobey strict spatiotemporal relationships, and OSM has, therefore, been considered to reflect an object-level masking (Enns & Di Lollo,
1997,
2000; Goodhew, Pratt, Dux, & Ferber,
2013). Two common forms of OSM are four-dot masking and common-onset masking. Four-dot masking is designed with a backward paradigm in which a target is followed by four dots surrounding the target and has a relatively large contour distance (Enns & Di Lollo,
1997). Common-onset masking is when the target and the mask come into view simultaneously, but the mask continues to be displayed after the target disappears (Jannati, Spalek, & Di Lollo,
2013). These two forms of OSM are often combined (Enns & Di Lollo,
2000; Gellatly, Pilling, Cole, & Skarratt,
2006). OSM masking effects are considered to occur at a higher level, i.e., the level of object representation, rather than at a lower level, such as the local contour interactions. There are two major accounts of OSM: object substitution and object updating. The object-substitution theory suggests that masking occurs when a separate mask representation replaces the target whereas the object-updating theory proposes that masking comes from the updating of the target by a mask within a single object representation (see review by Goodhew,
2017). The object-updating theory, however, is supported by overwhelming evidence (Goodhew,
2017). Within the frames of the object-updating theory, an object representation (the target) is initially established and, later, updated by a new input (the mask) if the mask is perceived as the same object as the target (Enns, Lleras, & Moore,
2009; Harrison, Rajsic, & Wilson,
2016; Lleras & Enns,
2004; Moore & Lleras,
2005). If the mask, however, is treated as a different object, the updating process for the target ceases, and the initial target information remains unchanged, i.e., survives from object-level masking. Hence, an underlying question is what stimulus attributes determine if the target and the mask share their object representation in the object-updating theory. Gellatly et al. (
2006) tried to answer this question by separating different object-level components of OSM and studying which dimensions of target–mask similarity impact object-level masking. However, they could not find any dimensions of target–mask similarity significantly impacting the masking of the whole target. Instead their study showed that OSM can operate at the level of independent features. Specifically, target–mask similarity on a particular dimension, e.g., the color, only affects reporting of that dimension, and as pointed out by Gellatly et al., it is a major challenge to empirically distinguish between object-level and feature-level OSM effects.