Project Tracker

Briefing Multiscale Detection


  • Implement a prototype able to track one simple object in a video scene.
  • Your program should be able to tell (print out) the position of he object; i.e. it is not sufficient to visualise detected features as many programming examples do. Imagine you want to use the information to control a robot to pick up the object, for instance.
  • You have full freedom to do this as you please, but bear in mind that we only ask for a prototype.
  • The goal is to learn constituent techniques, not to make a full solution for production.
  • It is a good idea to show your prototype in the exam, but we will be asking you how it works and/or how you would make it work in practice.
  • There is a week of self study to give you time to complete the project.
  • There is a separate page with some Project Tips to consult when you need it.

Step 1. Dataset.

  1. You can set up your own scene, recording your own video, with a characteristic, brightly coloured object moving through the scene. E.g. a bright, red ball rolling on the floor.
  2. Start with two consecutive frames from the video.

Step 2. Feature DEtector.

  1. Start with the feature detector. Make sure that it works. You may use a library function.
  2. Can you use the feature detector to detect your particular object in a still image?
  3. Visualise the detected object, by drawing a frame around it in the image.

Step 3. Tracker

  1. Introduce tracking only when you have a working prototype for still images.

This pseudo-code builds on the modularisations ideas in the original exercise text, see Edges.

  1. Find the corners (use the Harris detector)
  2. Calculate the spatial derivatives \(I_x,I_y\) using the Sobel filter.
  3. Calculate the temporal derivative \(I_t\) using the first order approximation of the difference between the next and the current frame.
  4. Calculate the element-wise products \(I_xI_t\), \(I_yI_t\), \(I_x^2\), \(I_xI_y\), \(I_y^2\), which we will use later.
  5. For each corner \(\mathbf{x}\),
    • calculate the \(G\) matrix and \(b\) vector
    • solve \(\mathbf{u} = -G^{-1}b\). Note that this is a matrix product.
    • use the numpy.linalg library to invert \(G\).
  6. Plot the features \(\mathbf{x}\) and the vectors \(\mathbf{u}\) in the image.
    • for plotting the vectors, this is the top of my google hits and looks useful
    • Note that the vector \(\mathbf{u}\) tells us the speed and direction of the point, and thus gives crucial information for analysing the behaviour of objects in a scene. Many applications will use this information in its own right, and not just use it to recover the same point in the next frame.
  7. Calculate the feature points from the next frame and plot them in the current frame (together with the vectors and points above).
    • Do the new positions fit with the previous positions and motion vectors?

Step 4. Multiscale Tracking (optional)

Step 5. Continuous Tracking (optional)

If everything works fine from one frame to the next, you can try to repeat the operation to track feature points throughout a video.

Step 6. Feature Descriptors (optional)

Feature Descriptors, to be discussed next session, allow us to match features between frames or images, even when tracking is not possible.

In some applications we use feature descriptors instead of tracking, because tracking is not feasible.

In other applications, we may want to use continuous tracking, but because tracking occasionally fails, we may need feature descriptors to recover from errors.

  1. You should probably first test feature descriptors one widely different frames, e.g. movements of ten pixels or so.
  2. If this works, you can add feature descriptors to your tracking system to validate that the correct object is being tracked, e.g. with a check every second.


  1. How did you fare?
  2. Exam planning.