# Optimization, Learning, Interactive Vision (9/7/2016)

Recap: from guest lecture by Chris Rozell:

• Structure of a neuron
• Optimization:
1. Dictionary learning (sparse Coding):
• Optimize: $Error(\Phi, X) + \lambda * Sparse(X)$
• $Error(\Phi, X) = \sum_{l=1}^m \lVert Y^{l}-\Phi X^{l} \rVert^2$
• This function is not convex, so finding a global solution is intractable. Local minimum can be found using gradient descent method.
2. Inference (compressed Sensing):
• Dictionary $\Phi$ is fixed. Given Y (image), which X (coefficients)?
• Optimize: $Error(x) + \lambda \cdot Sparsity(x)$
• $Error(x)$ is a convex function (with a single variable), so we can find a global solution efficiently.
• Thm: a function f: $\mathbb{R}^n \to \mathbb{R}_+$ can be minimized efficiently provided:
1. f is convex
2. f is well-rounded.
• Surround effect and inhibition in V1 neurons
• Data can be explained by models, but there are problems:
1. A lot of models, but not much theory
2. Optimization: how can we justify a hypothesis that brains are innately solving complicated optimization problems?
• Could dictionaries be learned by evolution?
– Optimizations via trial and error (strong survives, rest dies) over the course of very, very long time (~4 billion years = $4*10^9$ generations)
– Is $4*10^9$ generations enough to reach a globally optimized solution? There are $~10^6$ sensors in the retina and ${2^{10}}^6$ possible patterns in V1….

Learning: What can we propose as a general learning process?

We can start by modeling a single neuron.

McCullough-Pitts Model (1943):

Using this model, we can build any Boolean logic:

OR: $\sum x_i \geq 1$
AND: $\sum x_i \geq n$
NOT: $-x \geq 0$

So, this is a model of a single neuron and by composing neurons, a computer. What about the learning process?

Labels: $X \in \mathbb{R}^n$ is labeled as “+” or “-” ($l(X) \in \{1, -1\}$)

Assume: $\exists W^*$ s.t. $W^* X \geq \gamma$ if $l(X)$ = 1, and $W^*X \leq -\gamma$ if $l(X)$ = -1. Also, $\lVert W^* \rVert = 1$ and $\lVert X \rVert \leq 1$.

Goal: Find $W$ that can classify X’s correctly

Algorithm (Perceptron):

1. initialize W = 0
2. Repeat: On a mistake (i.e. W*X does not “match” $l(X)$):
W $\longleftarrow W + l(X)*X$

Proof:

1. $cos(\theta) = \frac{W\cdot W^*}{\lVert W \rVert}$. $(\lVert W^* \rVert = 1)$
2. On each mistake (i.e., either $W\cdot X < 0$ for $l(x)=1$ or $W\cdot X >0$ for $l(x)=-1$):
$W\cdot W^* \longleftarrow W\cdot W^* + l(X)(X\cdot W^*) \geq W\cdot W^* + \gamma$
$\lVert W \rVert ^2 \longleftarrow (W+l(X)X)\cdot(W+l(X)X) = W\cdot W + 2l(X)(W\cdot X) + \lVert X \rVert ^2 <= \lVert W\rVert^2 + 1$
3. After t mistakes:
$W\cdot W^* \geq W\cdot W^* + t\gamma$
$\lVert W \rVert ^2 \leq \lVert W\rVert^2 + t$
4. So, $\frac{W\cdot W^*}{\lVert W \rVert} \geq \frac{t\gamma}{ \sqrt{t} } = \sqrt{t}\gamma$ after t mistakes.
5. Since $\cos(\theta) \leq 1, t \leq \frac{1}{\gamma^2}$.

The algorithm finds a $W$ s.t. $W\cdot X \ge 0$ iff $l(x)=1$ after at most $1/\gamma^2$ iterations.

# Interactive Vision

Interactive vision is summarized as:

• Environment: evolutionary rationale rooted in improved motor control (four Fs)
• Serial Processing: parsing image as a sequence of saccades
• Prediction: visual learning allows prediction
• Interaction with other systems: motor system and vision are closely related
• Recurrence/Feedback: rich recurrence and feedback during recognition
• Memory: memory plays a role in what we see

Also, there are computational advantages of interactive vision:

1. Segmentation and recognition can be done more efficiently together
2. Movement (of eye, head, body) makes many visual computations simpler
3. Interactive perception simplifies the learning problem