# Tracking Features Lecture

## Changes from 0d33616a0ab52350e4c7ebfce26d29de8e17411e to 13176842df7a1a33b69c7a2090e836cb7bf15161

---
title: Tracking Features
categories: lecture
---

- aperture problem [page 78]

# Briefing

## Corners

![Universitetsområdet i Ålesund](Images/ntnuaes1.jpg)

![Universitetsområdet i Ålesund (ny vinkel)](Images/ntnuaes2.jpg)

+ What are distinctive points in the image?
+ Distinctive points can (to some extent) be matched in two different images.

[More Images](https://www.flickr.com/photos/ntnu-trondheim/collections/72157632165205007/)

## Corner Correspondence

+ Two images of the same scene
$I_1,I_2: \Omega\subset\mathbb{R}^2\to\mathbb{R}_+ ; \mathbf{x}\mapsto I_1(\mathbf{x}),I_2(\mathbf{x})$
+ Different in general

*Why are they different?*

+ Firstly - different colour in different directions
+ Lambertian assumption
+ Secondly - noise
+ let's assume this is insignificant
+ Thirdly - same point in different positions
+ different image points $\mathbf{x}_1,\mathbf{x}_2$
correspond to the same 3D point $p$

**Note** Two approaches:

1.  **Today** Motion modelling using differentiation
+ adequate for slow motion, say 2-3 pixels per frame
+ allows for subpixel accuracy
2.  *Next Session:* feature descriptors

## Motion Model

Suppose we have a feature point $\mathbf{x}_1$ in Image 1.
The corresponding feature $\mathbf{x}_2$ in Image 2 is the result of
a movement $h:\mathbb{R}^2\to\mathbb{R}^2$ so that
$\mathbf{x}_2=h(\mathbf{x}_1)$.

### Brightness Constancy Constraint

+  Suppose we photograph empty space except for a single point $p$
- *Brightness Constancy Constraint*

$$I_1(\mathbf{x}_1) = I_2(\mathbf{x}_2) \sim \mathcal{R}(p)$$

+  Simple dislocation from $\mathbf{x}_1$ to $\mathbf{x}_2$
+  Motion: $h: \mathbf{x}_1\mapsto\mathbf{x}_2$ so that
$$\forall \mathbf{x}_1\in\Omega\cap h^{-1}(\Omega)\subset\mathbb{R}^{2}, \;I_1(\mathbf{x}_1) = I_2(h(\mathbf{x}_1))$$

## Feature Tracking

+ Estimator
$$\hat h = \arg\min_h\sum_{\tilde{\mathbf{x}}\in W(\mathbf{x})} ||I_1(\tilde{\mathbf{x}})-I_2(h(\tilde{\mathbf{x}}))||^2$$

+ The window, or aperture, $W(\vec{x})$
+ Choose $h$ from a family of functions, parameterised by $\alpha$
+ translational: $\alpha=\Delta\mathbf{x}$
+ affine: $\alpha=\{A,\mathbf{d}\}$
+ **Aperture problem:** cannot distinguish points on a blank wall

<details>
<summary>Caveat: AlternativeMotion Models</summary>
### Motion Models

+ **Translational Motion Model:**
$$h(\mathbf{x}_1) = \mathbf{x}_1 + \mathbf{\Delta x}$$
+ **Affine Motion Model:**
$$h(\mathbf{x}_1) = A\mathbf{x}_1 + \mathbf{d}$$
+ **Projective Motion Model:**
$$h(\mathbf{x}_1) = H\mathbf{x}_1$$
where
$H\in\mathbb{R}^{3\times3}$ is defined up to a scalar factor.
</details>

<details>
<summary>Caveat: Intencity Transformation</summary>
### Intencity Transformation

+ Need to accept changes to the intencity

$$I_1(\mathbf{x}_1) = I_2(h(\mathbf{x}_1)) + n(h(\mathbf{x}_1))$$

+ Occlusions
+ Non-Lambertian reflection
+ Taken at different time?  Different ambient light?
</details>

### Infinitesimal Model

+ Consider simple translational model
$$I_1(\textbf{x})= I_2(h(\textbf{x}))= I_2(\textbf{x}+\Delta\textbf{x})$$
+ Consider infitesimally small $\Delta\textbf x$
+ Model on a time axis
+ two images taken infinitesimally close in time
+ ... under motion

1. First write $\mathbf{\Delta x} = \mathbf{u}dt$, and
rewrite the brightness constancy
$$I(\mathbf{x}(t),t) = I(\mathbf{x}(t)+\mathbf{u}dt,t+dt)$$
2.  Apply Taylor series expansion and ignore higher-order terms
$$\nabla I(\mathbf{x}(t),t)^\mathrm{T}\mathrm{u} + I_t(\mathbf{x}(t),t) = 0$$
where
$$\nabla I(\mathbf{x},t) = \begin{bmatrix} I_x(\mathbf{x},t)\\ I_y(\mathbf{x},t) \end{bmatrix} = \begin{bmatrix}\frac{\partial I}{\partial x}(\mathbf{x},t)\\ \frac{\partial I}{\partial y}(\mathbf{x},t) \end{bmatrix} \in\mathbb{R}^2$$
and
$$I_t(\mathbf{x},t) = \frac{\partial I(\mathbf{x},t)}{\partial t}\in \mathbb{R}$$
3. Simplify
$$\nabla I(\mathbf{x}(t),t)^\mathrm{T}\mathrm{u} + I_t(\mathbf{x}(t),t) = 0$$

+ *Brightness Constancy Constraint* for the simplest possible continuous model
+ Two applications
- optical flow: fix a position $\mathbf x$ and consider particles passing through
- feature tracking: fix a particle $x(t)$ an track it through space

### Solving for $\textbf{u}$

+ Consider the equation
$$\nabla I^\mathrm{T}\mathrm{u} + I_t = 0$$
+ This is our **target equation**.
+ There are infititly many solutions,
+ *aperture problem*
+ one scalar equation with two unknowns ($\mathrm{u}$ is a 2D vector)
+ We can solve for the component in the direction of the gradient though

1. Scalar projection of $\mathbf u$ onto $\nabla I$.
$$\frac{\nabla I^\mathrm{T}\mathrm{u}}{||\nabla I||} = - \frac{I_t}{||\nabla I||}$$
2. Multiplying by $\nabla I/||\nabla I||$, we get the vector projection:
$$\mathbf u_n = \frac{\nabla I^\mathrm{T}\mathrm{u}}{||\nabla I||}\cdot\frac{\nabla I}{||\nabla I||} = - \frac{I_t}{||\nabla I||}\cdot\frac{\nabla I}{||\nabla I||}$$

### Least squared errors estimate

+ Consider again the *target equation*
$$\nabla I^\mathrm{T}\mathrm{u} + I_t = 0$$
+ Equality is unrealistic - too much noise
+ Minimise instead the error: $$E=\nabla I^\mathrm{T}\mathrm{u} + I_t$$
+ Integrate over a window with sufficient texture
+ allows us to estimate $u$ in two dimensions
+ Minimise the sum of squared errors over the window:
$$E_b(\mathbf{u}) = \sum_{W(\mathbf{x})} [\nabla I^T(\tilde{\mathbf{x}},t)\mathbf{u}(\mathbf{x})+I_t(\tilde{\mathbf{x}},t)]^2$$
+ The solution is standard optimisation by calculus
+ cf. least squares regression in statistics
+ cf. extremal points in first-year calculus
+ Differentiate
$$\nabla E_b(\mathbf{u}) = 2\sum_{W(\mathbf{x})} \nabla I [\nabla I^T\mathbf{u}+I_t]$$
+ Spelling out the matrices, we have
$$\nabla E_b(\mathbf{u}) = 2\sum_{W(\mathbf{x})} \bigg(\begin{bmatrix} I_x^2 & I_xI_y \\ I_xI_y & I_y^2\end{bmatrix}\mathbf{u} + \begin{bmatrix} I_xI_t \\ I_yI_t\end{bmatrix}\bigg)$$
+ To minimise $E_b$, the derivative should be zero
$$0 = \begin{bmatrix} \sum I_x^2 & \sum I_xI_y \\ \sum I_xI_y & \sum I_y^2 \end{bmatrix}\mathbf{u} + \begin{bmatrix} \sum I_xI_t \\ \sum I_yI_t\end{bmatrix}$$
+ We refer to the first matrix as $G$, so that
$$G\mathbf{u} + \mathbf{b} = 0$$
+ If $G$ is non-singular, we have
$$\mathbf{u} = - G^{-1}\mathbf{b}$$
+ This gives us the motion vector $\mathbf{u}$

## Algorithm (4.1 of Ma 2004)

Compute
$$G(\mathbf{x}) = \begin{bmatrix} \sum I_x^2 & \sum I_xI_y \\ \sum I_xI_y & \sum I_y^2 \end{bmatrix}$$
and
$$b(\mathbf{x},t) = \begin{bmatrix} \sum I_xI_t \\ \sum I_yI_t\end{bmatrix}$$
at every pixel $\mathbf{x}$.

+ **Feature tracking** choose feature points $x_1,x_2,\ldots$
where $G(\mathbf{x})$ is invertible
+ **Optical flow** choose points $x_1,x_2,\ldots$ on a fixed grid

$$\mathbf{u}(x,t) = \begin{cases} - G^{-1}b&\quad\text{if defined}\\ 0&\quad\text{otherwise} \end{cases}$$

+ **Feature tracking** at time $t+1$ replace point $x$ by
$x+\mathbf{u}(x,t)$
+ **Optical flow** at time $t+1$ repeat operation at the same point $x$

### Practical implementation

+ Typically, the Sobel filter is appropriate for the spatial derivatives $\nabla I$
+ On the time axis, it is common to use a first order approximation (difference)
for the derivative $I_t$

## Challenge: Project

In a week or two, we will dedicate three sessions for a more substantial project.
This includes one self-study session 12 October, and one session before it and
one session after it.

The goal of the project is to make a prototype of an object tracker, that can identify
an object in a video stream and track it, frame by frame.

One approach for this project is the one we have introduced in this session, building on
last.  First, detect the corners and then track them differentially in the next frame.
This is not the only approach, however, and in practice one may have to combine several
techniques to get a robust solution.  It is a good idea to start thinking about the project
at this point, and if you have time, implement the the feature point tracker sooner rather
than later.