Multiscale Detection

Reading Ma 2004 Chapter 11.2


  1. Consider motion tracking by differentiation.
  2. Differentiation is based on infinitesimaly small changes. In practice it may work for motion up to 2-3 pixels/frame.
  3. Advantage: can track subpixel movement.
  4. Problems for
    • fast movement
    • high resolution
    • widely different camera angles
  5. For widely different camera angles we need to look elsewhere, but for fast movement and high resolution, we can use multiscale tracking
  6. Example Suppose the movement is 8 pixels/frame, clearly too much.
    • Downsample the image by a factor of ¼.
    • I.e. reduce the resolution by a quarter.
    • What is the movement rate now?

Multiscale Pyramid

  1. Two input frames \(I_1\) and \(I_2\).
  2. Write \(I_i^1=I_i\).
  3. Form \(I^k_i\) by downsampling \(I^{k-1}_i\) by a factor of ½.
  4. Two to four scales is normally sufficient
  5. Example Starting with frames at \(800\times640\), we form pyramids with versions with resolutions \(800\times640\), \(200\times160\), \(400\times320\), and \(100\times80\).
  6. Smoothing is used at each downsampling step


Coarsest level

  1. Start with the coarsest level, that is the largest value of \(k\).
  2. Compare \(I_1^k\) and \(I_2^k\) to calculate the motion \(d^k\) as usual.
  3. Maintain the running total estimated motion \(d\leftarrow d^k\).

Subsequent levels

  1. Consider level \(k\). We have estimated the motion in level \(k+1\) as \(d\). This corresponds to a motion of \(2d\) in level \(k\).
  2. Comparing \(I_1^k\) and \(I_2^k\) may not work, because the movement is too large. Therefore we adjust one iage to compensate for known motion.
  3. Compute \(\tilde I_2^k\) by shifting \(I_2^k\) \(2d\) pixels, i.e. \(\tilde I_2^k=I_2^k(x+2d)\).
  4. Using \(I_1^k\) and \(\tilde I_2^k\) we estimate the motion \(d^k\) as before.
  5. The total estimated motion is updated \(d\leftarrow 2d+d^k\). That is, we add the motion estimated to the motion removed.

Beware the sign of the movement. Ma (Eq 11.2) computes movement from image 2 to image 1, therefore we have a plus \(x+2d\) when we calculate \(\tilde I\). It is important to check that we have the movement in the right direction.

End of stage 1

  1. When we have completed level \(k=1\) we have the total estimated motion as \(d=d^1+2d^2+\cdots+2^{k-1}d^k\).

Stage 2. Fine tuning

  1. Fine tuning is done at the finest scale (\(I^1_1,I^1_2\)).
  2. Make the image \(\tilde I_2=I_2(x+d)\).
  3. If \(d\) is correctly estimated, these two images should be equal.
  4. They probably are not.
  5. Estimate the estimation error \(\Delta d\) as movement for the pair \(I_1,\hat I_2\).
  6. Adjust the estimate \(d\gets d+\Delta d\).
  7. Repeat from Step 5 until the correction term is small enough (or you it diverges and you conclude that this does not work).

Closing remark

There are several lessons to take home here.

  1. Obviously, you may want to do multiscale motion tracking.
  2. Resolution matters. You may have to decrease resolution to make a machine vision system work.
  3. Multiscale is a common idea which can be found in other areas of signal processing.
  4. The iterative fine tuning that we do is a common principle in numerical analysis (computational science) and is worth being aware of.