Our feature-tracking mechanism simulation has two modules. In the first module, at corners and sharp curvatures, the image intensity changes along different directions, and the Harris corner detector exploits this property to extract salient features. First, we computed the change in the intensity value of a part of image by sifting a small image patch in all directions and taking a difference between the patch and the shifted one. Suppose that
I(
x,
y) is the image intensity at the (
x,
y) position and consider a small image patch (receptive field) with a Gaussian window, (
x,
y) ∈
W (where
W is a Gaussian window of size 5 × 5 pixels). If the window is shifted by (∆
x, ∆
y), the sum of the squared difference (SSD) between two image patches will be
\begin{eqnarray*}
&& E\left( {{\rm{\Delta }}x,{\rm{\Delta }}y} \right)\\
&&\quad = \mathop \sum \limits_{x,y} W\left( {x,y} \right){(I\left( {x,y} \right) - I\left( {x + {\rm{\Delta }}x,y + {\rm{\Delta }}y} \right))^2}
\end{eqnarray*}
From the first-order Taylor expansion,
\begin{eqnarray}
&& I\left( {x + {\rm{\Delta }}x,y + {\rm{\Delta }}y} \right)\nonumber\\
&&\quad \approx I\left( {x,y} \right) + {I_x}\left( {{\rm{x}},{\rm{y}}} \right){\rm{\Delta }}x + {I_y}\left( {{\rm{x}},{\rm{y}}} \right){\rm{\Delta }}y \qquad
\end{eqnarray}
The SSD will be approximated by
\begin{eqnarray}
E\left( {{\rm{\Delta }}x,{\rm{\Delta }}y} \right)\,\, &\approx& \mathop \sum \limits_{x,y} W\left( {x,y} \right)(I\left( {x,y} \right) - (I\left( {x,y} \right) \nonumber \\
&& +\, {I_x}\left( {{\rm{x}},{\rm{y}}} \right){\rm{\Delta }}x + {I_y}\left( {{\rm{x}},{\rm{y}}} \right){\rm{\Delta }}y){)^2} \nonumber \\
&=& \mathop \sum\limits_{x,y} W\left( {x,y} \right)({I_x}\left( {{\rm{x}},{\rm{y}}} \right){\rm{\Delta }}x \nonumber\\
&& +\, {I_y}\left( {{\rm{x}},{\rm{y}}} \right){\rm{\Delta }}y)^2 \qquad
\end{eqnarray}
which can be written in matrix form:
\begin{eqnarray}\begin{array}{@{}l@{}}
= \left( {\begin{array}{*{20}{c}} {{\rm{\Delta }}x}&{{\rm{\Delta }}y} \end{array}} \right)\displaystyle\mathop \sum \limits_{x,y} W\left( {x,y} \right)\left( {\begin{array}{@{}*{2}{c}@{}} {I_x^2}&{{I_x}{I_y}}\\ {{I_x}{I_y}}&{I_y^2} \end{array}} \right)\left( {\begin{array}{@{}*{1}{c}@{}} {{\rm{\Delta }}x}\\ {{\rm{\Delta }}y} \end{array}} \right)\\ = \left( {\begin{array}{*{20}{c}} {{\rm{\Delta }}x}&{{\rm{\Delta }}y} \end{array}} \right){\boldsymbol{M}}\left( {\begin{array}{@{}*{1}{c}@{}} {{\rm{\Delta }}x}\\ {{\rm{\Delta }}y} \end{array}} \right)\\ = \left( {\begin{array}{*{20}{c}} {{\rm{\Delta }}x}&{{\rm{\Delta }}y} \end{array}} \right){\boldsymbol{U}}{{\bf \Lambda }}{{\boldsymbol{U}}^{\boldsymbol{T}}}\left( {\begin{array}{@{}*{1}{c}@{}} {{\rm{\Delta }}x}\\ {{\rm{\Delta }}y} \end{array}} \right) \end{array} \qquad
\end{eqnarray}
where
U is an orthonormal matrix containing the two eigenvectors of
M, and
Λ is a diagonal matrix, where
\({{\bf \Lambda }} = ({{{\rm{\lambda }}_1}\atop 0}\; {0\atop {{{\rm{\lambda }}_2}}} )\), with the two eigenvalues. Eigenvalues quantify change in the image intensity values along the eigenvectors, and they differ based on image properties.
Figure A1 shows those properties. At a uniform region on the left, there is no change in the image intensity at the receptive field in accordance with the displacement of the receptive field (λ
1 and λ
2 are close to zero). When the receptive field is close to the edge, the image intensity changes in a direction perpendicular to the contour but not along the contour (λ
1 has a large value, but λ
2 is small). However, at the corner, the image intensity differs in all directions (both λ
1 and λ
2 have a large value). We quantified features (corners) by the following equation:
\begin{eqnarray}
h \,\, &=&\, {{\rm{\lambda }}_1}{{\rm{\lambda }}_2} - K{\left( {{{\rm{\lambda }}_1} + {{\rm{\lambda }}_2}} \right)^2}\nonumber\\
&=& \, \det \left( {{\bf \Lambda }} \right) - K\ tr{\left( {\rm{\Lambda }} \right)^2} \qquad
\end{eqnarray}
where
tr is the trace of the matrix, and the threshold
K is set at 0.05 as a default by MATLAB. We extracted local features if
h > 0.05. Suppose that the extracted local feature that satisfies
h > 0.05 is
LF(
cxt,
cyt) where
cxt and
cyt are the center of a 5 × 5 pixel extracted feature at time
t, then this local feature is cross-correlated with the succeeding frame to extract the next location of the feature:
\begin{eqnarray}
\begin{array}{@{}l@{}}
c{x_{\left( {t + 1} \right)}},c{y_{\left( {t + 1} \right)}}\\
\quad = \mathop {{\rm{argmax}}}\limits_{x,y} \bigg( \sum\limits_u \sum\limits_v I\left( {u,v,t + 1} \right)\\ \qquad {L_F} ( {x + u,y + v} ) \bigg) \end{array} \qquad
\end{eqnarray}
The second module in feature tracking combines the outputs from motion-energy units into PDS mechanisms (
Figure 7A, top). The input of this unit is the motion-energy vector flow, given by
Equations A11 and A12. Considering only the locations (
cxt,
cyt), the center of locations extracted from the Harris corner detector, for each vector, the ambiguity resides along a perpendicular line from the vector, and we assumed the observer's measurement is ambiguous. Thus, the
ith measurement distribution given the local velocity,
\({\boldsymbol{\vec{v}}}\), is represented by a Gaussian distribution:
\begin{eqnarray}
&& p\left( {{S_i},\ {{\rm{\Theta }}_i}{\rm{|}}{\boldsymbol{\vec{v}}}\left( {c{x_t},c{y_t},t} \right)} \right) \nonumber \\
&&\,\,\, \propto {\rm{exp}}\left( { - \frac{1}{{2{\sigma ^2}}}{{\left( {{\rm{sin}}{{\rm{\Theta }}_i}{v_x} + {\rm{cos}}{{\rm{\Theta }}_i}{v_y} - {S_i}} \right)}^2}} \right) \qquad \end{eqnarray}
Then, we combine the observer's measurement with the slowest motion prior, represented as follows:
\begin{eqnarray}
p\left( {{\boldsymbol{\vec{v}}}} \right) \propto {\rm{exp}}\left( { - \frac{{v_x^2 + v_y^2}}{{2{\sigma _p}^2}}} \right) \qquad
\end{eqnarray}
where σ
p = 2 (arbitrarily chosen). The posterior probability over a velocity would be computed by using Bayes’ rule:
\begin{eqnarray*}
&& p\left( {{\boldsymbol{\vec{v}}}{\rm{|}}{S_1},\ {{\rm{\Theta }}_1}, \ldots ,{S_n},\ {{\rm{\Theta }}_n}} \right) \\
&& \quad \propto p\left( {{\boldsymbol{\vec{v}}}} \right)p({S_1},\ {{\rm{\Theta }}_1}, \ldots ,{S_n},\ {{\rm{\Theta }}_n}|{\boldsymbol{\vec{v}}}) \end{eqnarray*}
By assuming conditional independence,
\({S_1},\ {{\rm{\Theta }}_1} \bot {S_2},\ {{\rm{\Theta }}_2}\ \ldots \bot {S_n},\ {{\rm{\Theta }}_n}|\vec{v}\), the equation becomes:
\begin{eqnarray}
&& p\left( {{\boldsymbol{\vec{v}}}{\rm{|}}{S_1},\ {{\rm{\Theta }}_1}, \ldots ,{S_n},\ {{\rm{\Theta }}_n}} \right) \nonumber \\
&& \quad \propto p\left( {{\boldsymbol{\vec{v}}}} \right)\prod\limits_i^n {p({S_i},\ {{\rm{\Theta }}_i}|{\boldsymbol{\vec{v}}})} \qquad \end{eqnarray}
Then, because the local velocity was estimated by taking the MAP, the velocity field from PDS becomes
\begin{eqnarray}
&& {Q_{PDS}}\left( {c{x_t},c{y_t},t} \right) \nonumber \\
&& \quad = \mathop{{\rm{argma}}{{\rm{x}}}}\limits_{\vec{v}}\left( {p\left( {{\boldsymbol{\vec{v}}}} \right)\prod\limits_i^n {p({S_i},\ {{\rm{\Theta }}_i}|{\boldsymbol{\vec{v}}})} } \right)\qquad \end{eqnarray}
The vector fields
QFT from these two feature-tracking units are combined by a vector sum:
\begin{eqnarray}
{Q_{FT}} = {Q_{ME}} + {Q_{PDS}} \qquad
\end{eqnarray}