Tracking Features

Briefing

aperture problem [page 78]

Briefing

Corners

Universitetsområdet i Ålesund (ny vinkel)

What are distinctive points in the image?
Distinctive points can (to some extent) be matched in two different images.

More Images

Corner Correspondence

Two images of the same scene \(I_1,I_2: \Omega\subset\mathbb{R}^2\to\mathbb{R}_+ ; \mathbf{x}\mapsto I_1(\mathbf{x}),I_2(\mathbf{x})\)
Different in general

Why are they different?

Firstly - different colour in different directions
- Lambertian assumption
Secondly - noise
- let’s assume this is insignificant
Thirdly - same point in different positions
- different image points \(\mathbf{x}_1,\mathbf{x}_2\) correspond to the same 3D point \(p\)

Note Two approaches:

Today Motion modelling using differentiation
- adequate for slow motion, say 2-3 pixels per frame
- allows for subpixel accuracy
Next Session: feature descriptors

Motion Model

Suppose we have a feature point \(\mathbf{x}_1\) in Image 1. The corresponding feature \(\mathbf{x}_2\) in Image 2 is the result of a movement \(h:\mathbb{R}^2\to\mathbb{R}^2\) so that \(\mathbf{x}_2=h(\mathbf{x}_1)\).

Brightness Constancy Constraint

Suppose we photograph empty space except for a single point \(p\)
- Brightness Constancy Constraint

\[I_1(\mathbf{x}_1) = I_2(\mathbf{x}_2) \sim \mathcal{R}(p)\]

Simple dislocation from \(\mathbf{x}_1\) to \(\mathbf{x}_2\)
Motion: \(h: \mathbf{x}_1\mapsto\mathbf{x}_2\) so that \[\forall \mathbf{x}_1\in\Omega\cap h^{-1}(\Omega)\subset\mathbb{R}^{2}, \;I_1(\mathbf{x}_1) = I_2(h(\mathbf{x}_1))\]

Feature Tracking

Estimator \[\hat h = \arg\min_h\sum_{\tilde{\mathbf{x}}\in W(\mathbf{x})} ||I_1(\tilde{\mathbf{x}})-I_2(h(\tilde{\mathbf{x}}))||^2\]
The window, or aperture, \(W(\vec{x})\)
Choose \(h\) from a family of functions, parameterised by \(\alpha\)
- translational: \(\alpha=\Delta\mathbf{x}\)
- affine: \(\alpha=\{A,\mathbf{d}\}\)
Aperture problem: cannot distinguish points on a blank wall

Caveat: AlternativeMotion Models

### Motion Models

Translational Motion Model: \[h(\mathbf{x}_1) = \mathbf{x}_1 + \mathbf{\Delta x}\]
Affine Motion Model: \[h(\mathbf{x}_1) = A\mathbf{x}_1 + \mathbf{d}\]
Projective Motion Model: \[h(\mathbf{x}_1) = H\mathbf{x}_1\] where \(H\in\mathbb{R}^{3\times3}\) is defined up to a scalar factor.

Caveat: Intencity Transformation

### Intencity Transformation

Need to accept changes to the intencity

\[I_1(\mathbf{x}_1) = I_2(h(\mathbf{x}_1)) + n(h(\mathbf{x}_1))\]

Occlusions
Non-Lambertian reflection
Taken at different time? Different ambient light?

Infinitesimal Model

Consider simple translational model \[I_1(\textbf{x})= I_2(h(\textbf{x}))= I_2(\textbf{x}+\Delta\textbf{x})\]
Consider infitesimally small \(\Delta\textbf x\)
Model on a time axis
- two images taken infinitesimally close in time
- … under motion

First write \(\mathbf{\Delta x} = \mathbf{u}dt\), and rewrite the brightness constancy \[I(\mathbf{x}(t),t) = I(\mathbf{x}(t)+\mathbf{u}dt,t+dt)\]
Apply Taylor series expansion and ignore higher-order terms \[\nabla I(\mathbf{x}(t),t)^\mathrm{T}\mathrm{u} + I_t(\mathbf{x}(t),t) = 0\] where \[\nabla I(\mathbf{x},t) = \begin{bmatrix} I_x(\mathbf{x},t)\\ I_y(\mathbf{x},t) \end{bmatrix} = \begin{bmatrix}\frac{\partial I}{\partial x}(\mathbf{x},t)\\ \frac{\partial I}{\partial y}(\mathbf{x},t) \end{bmatrix} \in\mathbb{R}^2\] and \[I_t(\mathbf{x},t) = \frac{\partial I(\mathbf{x},t)}{\partial t}\in \mathbb{R}\]
Simplify \[\nabla I(\mathbf{x}(t),t)^\mathrm{T}\mathrm{u} + I_t(\mathbf{x}(t),t) = 0\]

Brightness Constancy Constraint for the simplest possible continuous model
Two applications
- optical flow: fix a position \(\mathbf x\) and consider particles passing through
- feature tracking: fix a particle \(x(t)\) an track it through space

Solving for \(\textbf{u}\)

Consider the equation \[\nabla I^\mathrm{T}\mathrm{u} + I_t = 0\]
- This is our target equation.
There are infititly many solutions,
- aperture problem
- one scalar equation with two unknowns (\(\mathrm{u}\) is a 2D vector)
We can solve for the component in the direction of the gradient though

Scalar projection of \(\mathbf u\) onto \(\nabla I\). \[\frac{\nabla I^\mathrm{T}\mathrm{u}}{||\nabla I||} = - \frac{I_t}{||\nabla I||} \]
Multiplying by \(\nabla I/||\nabla I||\), we get the vector projection: \[\mathbf u_n = \frac{\nabla I^\mathrm{T}\mathrm{u}}{||\nabla I||}\cdot\frac{\nabla I}{||\nabla I||} = - \frac{I_t}{||\nabla I||}\cdot\frac{\nabla I}{||\nabla I||} \]

Least squared errors estimate

Consider again the target equation \[\nabla I^\mathrm{T}\mathrm{u} + I_t = 0\]
- Equality is unrealistic - too much noise
- Minimise instead the error: \[E=\nabla I^\mathrm{T}\mathrm{u} + I_t\]
Integrate over a window with sufficient texture
- allows us to estimate \(u\) in two dimensions
- Minimise the sum of squared errors over the window: \[E_b(\mathbf{u}) = \sum_{W(\mathbf{x})} [\nabla I^T(\tilde{\mathbf{x}},t)\mathbf{u}(\mathbf{x})+I_t(\tilde{\mathbf{x}},t)]^2\]
The solution is standard optimisation by calculus
- cf. least squares regression in statistics
- cf. extremal points in first-year calculus
Differentiate \[\nabla E_b(\mathbf{u}) = 2\sum_{W(\mathbf{x})} \nabla I [\nabla I^T\mathbf{u}+I_t]\]
Spelling out the matrices, we have \[\nabla E_b(\mathbf{u}) = 2\sum_{W(\mathbf{x})} \bigg(\begin{bmatrix} I_x^2 & I_xI_y \\ I_xI_y & I_y^2\end{bmatrix}\mathbf{u} + \begin{bmatrix} I_xI_t \\ I_yI_t\end{bmatrix}\bigg)\]
To minimise \(E_b\), the derivative should be zero \[0 = \begin{bmatrix} \sum I_x^2 & \sum I_xI_y \\ \sum I_xI_y & \sum I_y^2 \end{bmatrix}\mathbf{u} + \begin{bmatrix} \sum I_xI_t \\ \sum I_yI_t\end{bmatrix}\]
We refer to the first matrix as \(G\), so that \[G\mathbf{u} + \mathbf{b} = 0\]
If \(G\) is non-singular, we have \[\mathbf{u} = - G^{-1}\mathbf{b}\]
This gives us the motion vector \(\mathbf{u}\)

Algorithm (4.1 of Ma 2004)

Compute \[G(\mathbf{x}) = \begin{bmatrix} \sum I_x^2 & \sum I_xI_y \\ \sum I_xI_y & \sum I_y^2 \end{bmatrix}\] and \[b(\mathbf{x},t) = \begin{bmatrix} \sum I_xI_t \\ \sum I_yI_t\end{bmatrix}\] at every pixel \(\mathbf{x}\).

Feature tracking choose feature points \(x_1,x_2,\ldots\) where \(G(\mathbf{x})\) is invertible
Optical flow choose points \(x_1,x_2,\ldots\) on a fixed grid

\[\mathbf{u}(x,t) = \begin{cases} - G^{-1}b&\quad\text{if defined}\\ 0&\quad\text{otherwise} \end{cases} \]

Feature tracking at time \(t+1\) replace point \(x\) by \(x+\mathbf{u}(x,t)\)
Optical flow at time \(t+1\) repeat operation at the same point \(x\)

Practical implementation

Typically, the Sobel filter is appropriate for the spatial derivatives \(\nabla I\)
On the time axis, it is common to use a first order approximation (difference) for the derivative \(I_t\)

Challenge: Project

In a week or two, we will dedicate three sessions for a more substantial project. This includes one self-study session 12 October, and one session before it and one session after it.

The goal of the project is to make a prototype of an object tracker, that can identify an object in a video stream and track it, frame by frame.

One approach for this project is the one we have introduced in this session, building on last. First, detect the corners and then track them differentially in the next frame. This is not the only approach, however, and in practice one may have to combine several techniques to get a robust solution. It is a good idea to start thinking about the project at this point, and if you have time, implement the the feature point tracker sooner rather than later.