Tracking Features

Exercises
Debrief
- Exercise 4.1 (Ma 2004)

Date 24 September 2021

Exercises

Algebraic Formulation

Exercise 4.1 (Ma 2004)
- Let \(\mathbf{X}\) be a 3D point and \(\mathbf{x}\) its projection
- Consider the image motion \(h(\mathbf{x}) = \mathbf{x}+\Delta\mathbf{x}\)
- What transformation \((R,T)\) must the scene undergo to relalise the given \(h(x)\)?
- Hint
  - We remember that \((R,T)\) has six degrees of freedom in general.
  - To support this given \(h\), some of these six degrees have to be fixed. Which ones?

Experimental View

Consider the image of a square, say, with corners at \[(0,0); (10,0); (0,10); (10,10),\] and visualise the results of different motions. You can programme this in Python, or do the calculations by hand. Given that it is 2D, hand calculation is not impractical. Python is faster when you know how, but you need to look up a 2D API instead of reusing the 3D functions we have used for previous motion exercises.

For each of the following models, apply the transformation and plot the result.

Translation \(\Delta x\), e.g. \(\Delta x=(6,4)\)
Affine: \(x \mapsto Ax + d\) for an invertible matrix \(A\) and a translation vector \(d\). E.g. \[A = \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix} \quad d = (4,10)\]
Projective: \(x \mapsto Hx\) where \(x\) is in homogeneous co-ordinates and \(H\) is defined up to a scalar factor. E.g. \[H = \begin{bmatrix} 1 & 1 & 0 \\ 0 & 1 & 1 \\ 1 & 1 & 1 \end{bmatrix}\] Not that the resulting point is given in homogenous co-ordinates, and we need to scale it so that the third co-ordinate is 1, i.e. \((x,y,z)\mapsto(x/z,y/z)\).
If you find a way to do this in Python, you should try multiple transformations to see how they behave in practice.

If you have time

Exercises 4.2, 4.3 (Ma 2004)
Exercises 4.4 (Ma 2004)
Alternatively, you may start implementing the feature tracker in Python/OpenCV or another language.

Debrief

Exercise 4.1 (Ma 2004)

Note We discussed the solution in the debrief session, and this discussion is far more instructive than the algebraic solution outlined below. Yet, this sketch may add some other insights.

Sketch for a solution Write \[\mathbf{x}=\Pi\mathbf{X},\] where \(\Pi\) is the projection matrix, including the camera parameters.

After the transformation, we have \[\mathbf{x}+\Delta\mathbf{x}=\Pi(R\mathbf{X}+T)=\Pi R\mathbf{X}+\Pi T\]

Inserting for \(\mathbf{x}\) on the left hand side, we have \[\Pi\mathbf{X}+\Delta\mathbf{x}=\Pi R\mathbf{X}+\Pi T\]

To satisfy this for every \(\mathbf{X}\), we require that \[\Pi = \Pi R\] and \[\Delta\mathbf{x}=\Pi T\]

Recall that \(\Pi\) is given as \[\Pi=\frac1Z\cdot \begin{bmatrix} f & 0 & 0 \\ 0 & f & 0 \\ 0 & 0 & 1 \end{bmatrix} \] Note that the projection matrix depends on the \(Z\)-coordinate of the 3D point.

Since \(\Pi=\Pi R\), we must have \(R=I\), i.e. there is no rotation.

Let us write \(\Delta\mathbf{x} = (\Delta x, \Delta y,0)\) and \(T^T=(x_t,y_t,z_t)\), to get \[\Delta x = \frac{fx_t}{Z}\quad \Delta y = \frac{fy_t}{Z}\] Furthermore, \[0=\frac{fz_t}{Z},\] and hence \(z_t=0\).

We conclude that the transformation \((R,T)\) has to be a translation parallel to the image plane.