Referring to the Proximity and Similarity Principle Simulation section, the proposed model can simulate the grouping of dot patterns in gestalt laws of similarity and proximity, and it should also be possible for it to accomplish the task of image segmentation. In this section, the two-reach method proposed in the section (Method) is adopted to segment the images.
Each image is first resized to 500 pixels in its larger dimension, and the superpixel map
\( S = \lbrace s_1,s_2,...,s_n\rbrace \) is obtained using SLIC (
Achanta et al., 2012) with size parameter
\( regionSize = 50\) and regular parameter
\( regularizer = 500\). Then, five features
\( f=\lbrace f_p, f_c\rbrace \), including the average position
\( f_p = \lbrace x,y\rbrace \) and the color feature
\( f_c = \lbrace r, g, b\rbrace \) in color space, are extracted for each superpixel and normalized to
\( [0,1]\).
It is held that the position features represent the proximity of superpixels and that the color features demonstrate the appearance similarity. Because there are five feature dimensions (two for location and three for color) of an RGB image, k-NN with
\( k=5\) is adopted to replace Delaunay triangulation when constructing the neighborhood graph, because Delaunay triangulation is slow and unstable in processing high-dimensional data. Usually, image segmentation results are not unique, but they may vary from coarse to fine (
Pablo et al., 2011). The simulation experiment of proximity and similarity described in the previous section revealed that it is feasible to balance the weightings between position and color features to obtain results in various scales.
The feature distance between super-pixels
\( s_i\) and
\( s_j\) is computed as
\begin{eqnarray}
dist(s_i,s_j)\, &=&\, \Vert \Delta f\Vert _2 \nonumber \\
&=& \sqrt{\Vert w_f\cdot \Delta f_p\Vert ^2_2 + \Vert \Delta f_c \Vert ^2_2},
\end{eqnarray}
where
\( w_f\) is a balance parameter between regional proximity and appearance similarity, and
\( \Delta f\) is the difference of average feature values between two superpixels, including the location and color feature. Various neighborhood graphs are constructed with different balanced weightings
\( w_f\) that can be used in the proposed methods. Finally, the proposed two-reach method combines two super-pixels if they are physically close to one another and have a similar appearance.
The visual results of the proposed model with different
\( w_f\) are shown in
Figure 10. As shown in this figure, one can obtain various plausible image segmentation results with different balance weightings, where
\( w_f=\lbrace 0.1, 0.2 \rbrace\) are relatively finer.
Compared with human labeled salient objects from
Achanta et al. (2009), segmentation with
\( w_f = 0.5\) can correctly segregate the object from the background in each image, which shows the practical potential of the proposed model in computer vision.
Our model can also accomplish edge detection.
Figure 11 shows some visual examples in which the edge maps can produce relatively good segmentations for each image, and the average edge maps are quite consistent with the human labeled edge ground truth from
Arbelaez et al. (2010). However, row (d) in the figure exhibits a difficult case, where the leaves in the background are not well segmented owing to the limited ability of our model.
Because our model is a prototype of a segmentation method transferred from perceptual grouping, there are still several deficiencies to be further improved. One is that only location and color features (i.e., low-level or mid-level features) were used in our experiment, but more sophisticated features (e.g., texture, orientation, luminance, and even high-level cues) may benefit the segmentation ability of our method. Additionally, the parameters, such as regional size, number of superpixels, thresholds \( th_{ID1}\), and \( th_{ID2}\), weighting \( w_f\), and even the methods for distance calculation can be more flexibly adjusted for better segmentation performance.