Visual Pathway: Retina (~1 million sensors) > V1 (~140 million) > V2, V3, V4, V5, V6 > IT > Hippocampus
There are certain visual patterns that neurons in V1 respond to.
For example, from Huebel and Wiesel’s paper:
Question 1: Why these particular patterns?
Question 2: How these patterns?
Image representation using basis vectors
We can think an image of size n (total number of pixels) as a vector of pixel values:
(x_{1}, x_{2}, …, x_{n}) ∈
We can also represent an image using a set of basis vectors.
For example, if we have n vectors which have a single 1 in different “coordinates”:
p_{1} = (1,0,0, …, 0)
p_{2} = (0,1,0, …, 0)
p_{3} = (0,0,1, …, 0)
…
p_{n} = (0,0,0, …, 1)
then we can represent ANY images using a linear combination of these vectors since they form a basis (there are n of them, and they are linearly independent, in fact pairwise orthogonal):
ANY Image = ∑α_{i}p_{i}
For what kind of images would this pixel basis be “good”?
One possible criteria is counting #nonzero coefficients (a_{i}s) used to represent an image: using small number of basis vectors is better!
Then, these vectors are good for representing an image with few dots:
But not good for representing a dense image:
What is another possible set of basis vectors? (“features”)?
Hadamard Basis:
Φ1 = (1,1,1,1,1,1, …, 1,1,1,1,1,1)
Φ2 = (1,1,1,1,1,1, …,1,1,1,1,1,1)
Φ3 = (1,1,1,1,1,1, …, 1,1,1,1,1,1)
…
Φn = (…)
Because these n vectors are orthogonal to each other (Φi•Φj = 0 for all i =! j), they can represent any images:
Rank(Φ_{1},Φ_{2},Φ_{3}, …, Φ_{n}) = n
So, ∀X ∈ , ∃α_{i} s.t. X = ∑α_{i}Φ_{i}
So, there are many, many possible bases. Which are better and which should we pick?
It depends on shapes in images: circles, rings, or lines…
Many “natural” classes are bases, but some are not complex enough.
For example, horizontal and vertical halfspaces in 2d cannot represent a single pixel in an image (try it!), and therefore, cannot represent all images. (There are only 2k such halfspaces, but we need k^{2} basis vectors to cover whole k by k sized image space).
Exercise. Can we represent any single pixel using all halfspaces (at any angle) in 2d (and hence all images)?
Sparse Coding
OlshausenField Paper: let’s use a basis that can represent images as sparsely as possible!
Problem: Set of images: Y^{1}, Y^{2}, …, Y^{m} ∈
Find a basis Φ: Φ_{1},Φ_{2}, …, Φ_{n} and vectors X^{1}, X^{2}, X^{3}, …, X^{m} such that Y^{l} = ΦX^{l} AND every X^{l} is sparse.
We need to minimize:
S() is a cost function that determines the sparseness of a vector.
How do we measure “sparseness” of a vector?
Sparse function should incur a higher cost for a vector which has more “activity” spread out.
One way can be using norm: = # of nonzero entries in a vector.
However, minimizing norm is an NPhard problem. Also, it doesn’t represent sparseness well because it doesn’t consider magnitude of each entry.
What about the norm? (sum of absolute values of all entries)
Example. Suppose we have three normalized vectors:
a: (1,0, …, 0)
b:
c:
And here are sparseness cost of each vector using L0 norm and L1 norm:
L0 norm 
L1 norm 

a 
1 
1 
b 
n 

c 
n 
O(1) 
Using norm gives same sparseness cost for both vector b and c, even if c is more sparse because magnitude of entries approach 0.
Using norm correctly captures this fact (c is more sparse than b).
Other functions can be used to measure sparsity, and the paper presents two other such functions: and
Algorithm (Gradient Descent):
Minimize
 Initialize with random vectors.
 Compute X by finding local optimal to:
[formula (5) in the paper]  Compute gradient in and update :
[formula (6) in the paper]  Repeat until change in E is less than a chosen threshold.
We note this algorithm is a heuristic. Neither the running time nor the quality of the solution found have any rigorous guarantees in general. However, as we saw with the implementation, it produces very interesting results and is considered a seminal paper in the field.