In an ideal Bayesian framework, one would compute the full posterior distribution over the latent variables encoding object shape
p(
A|
R), given by
\begin{eqnarray}
p(A|R) \propto \sum _X p(R|X, S = DA)\, p(X)\, p(A), \quad
\end{eqnarray}
where
p(
R|
X,
S) reflects the probabilistic (Poisson) model used in generating the spikes (Appendix
Equation 11). The posterior
p(
A|
R) assigns a probability for every possible stimulus pattern
S =
DA given the spikes
R coming from the retina, taking into account all possible eye movement trajectories weighted by their probability. We use a series of approximations to derive a computationally tractable, causal, and online computation to estimate
A (see the
Appendix for details). First, only the most probable set of latent shape variables is considered,
\(\hat{A} = {\rm argmax}_A p(A|R)\). The second is to deal with the intractable sum over all possible eye trajectories by using an online approximation of the EM algorithm. The EM algorithm maximizes log
P(
A|
R) in an iterative manner by alternating between two steps, one for estimating
X, which comes from introducing a variational distribution
q(
X), and the other for estimating
A. To make time explicit in
X and
R, we henceforth rewrite them as
X0: T = (
X0,
X1, …
XT) and
R0: T = (
R0,
R1, …
RT), where
T is the total number of time steps in the simulation.
Rt denotes the number of spikes emitted from each RGC in the time interval [
t,
t + Δ
t]. Because
Rt depends only on the current eye position,
Xt, and the stimulus,
S, we can derive a set of EM update equations as follows:
\begin{eqnarray}
q_t(X_t) \leftarrow p(X_t|R_{0:T}, S=DA^{\prime }) \quad
\end{eqnarray}
\begin{eqnarray}
\hskip-25pt &&A^{\prime } \leftarrow {\rm argmax}_{A} \left[\sum _t \sum _{X_t}q_t(X_t) \log p(R_t|X_t, S = DA)\right.\nonumber\\
\hskip-25pt&&\quad\left. +\, \log p(A)\vphantom{\sum_t} \right]
\end{eqnarray}
A full derivation is given in Appendix
Equations 31-
34.
Equation 2 estimates the eye position at time
t,
Xt, given the spikes
R0: T and the current estimate of the spatial pattern
A′, while
Equation 3 estimates
A given the spikes
R0: T and estimated eye positions
X0: T. The traditional EM algorithm repeatedly applies these equations for some number of iterations. For simplicity,
A can be initialized to zero. Note that although these update equations are guaranteed to converge to a critical point of log
P(
A|
R) by repeatedly applying them (and initializing them with
A = 0), they are still non-causal (requiring spikes from the future to estimate quantities at the current time
t), and
Equation 3 is not amenable to online processing because it requires optimizing over a batch of quantities from
t = 0:
T.