Revision 2403cce5f0d1f338ca2d7e11b9e67eb27e444d32 (click the page title to view the current version)

Project Tracker

Briefing Multiscale Detection

Exercise

  • Implement a prototype able to track one simple object in a video scene.
  • Your program should be able to tell (print out) the position of he object; i.e. it is not sufficient to visualise detected features as many programming examples do. Imagine you want to use the information to control a robot to pick up the object, for instance.
  • You have full freedom to do this as you please, but bear in mind that we only ask for a prototype.
  • The goal is to learn constituent techniques, not to make a full solution for production.
  • It is a good idea to show your prototype in the exam, but we will be asking you how it works and/or how you would make it work in practice.
  • There is a week of self study to give you time to complete the project.
  • There is a separate page with some Project Tips to consult when you need it.

Step 1. Dataset.

  1. You can set up your own scene, recording your own video, with a characteristic, brightly coloured object moving through the scene. E.g. a bright, red ball rolling on the floor.
  2. Start with two consecutive frames from the video.

Step 2. Feature DEtector.

  1. Start with the feature detector. Make sure that it works. You may use a library function.
  2. Can you use the feature detector to detect your particular object in a still image?
  3. Visualise the detected object, by drawing a frame around it in the image.

Step 3. Tracker

  1. Introduce tracking only when you have a working prototype for still images.

This pseudo-code builds on the modularisations ideas in the original exercise text, see Edges.

  1. Find the corners (use the Harris detector)
  2. Calculate the spatial derivatives \(I_x,I_y\) using the Sobel filter.
  3. Calculate the temporal derivative \(I_t\) using the first order approximation of the difference between the next and the current frame.
  4. Calculate the element-wise products \(I_xI_t\), \(I_yI_t\), \(I_x^2\), \(I_xI_y\), \(I_y^2\), which we will use later.
  5. For each corner \(\mathbf{x}\),
    • calculate the \(G\) matrix and \(b\) vector
    • solve \(\mathbf{u} = -G^{-1}b\). Note that this is a matrix product.
    • use the numpy.linalg library to invert \(G\).
  6. Plot the features \(\mathbf{x}\) and the vectors \(\mathbf{u}\) in the image.
  7. Calculate the feature points from the next frame and plot them in the current frame (together with the vectors and points above).
    • Do the new positions fit with the previous positions and motion vectors?

Step 4. Multiscale Tracking (optional)

Step 5. Continuous Tracking (optional)

If everything works fine from one frame to the next, you can try to repeat the operation to track feature points throughout a video.

Debrief

  1. How did you fare?
  2. Exam planning.

Tracking in Practice

We have seen two broad methods used in class.

  1. Find feature and draw a bounding box, for each frame independently.
  2. Using the temporal derivative to estimate motion.

The first approach disregards the history completely. The result is that it does not always find the same points in every frame. Drawing the bounding box, one will see it jumping to a different object in some frames.

The second approach uses only the history. Since there is a lot of noise and approximation in the problem, it is not going to be perfect. If a feature point is lost, we may end up tracking from a non-feature point where \(G\) is not invertible.

In practice, one will have to combine approaches, if it is necessary to identify and track individual objects. There are other tools in the box:

  1. Feature Descriptors can be used to match features between frames. We shall discuss SIFT tomorrow.
  2. Other heuristics can be used to identify objects (i.e. objects bounded by more than one feature point/edges).

Complete mastery or solution of particular practical problems are beyond the scope of a 7½ point module, but could be a suitable final year project for next semester. In this module, the goal is to understand some of the building blocks in sufficient depth to be able to tweak and combine them when needed. The solutions to each of the toy problems we solve are not that important. The experience is what is important.