# Multiscale Detection

**Reading** Ma 2004 Chapter 11.2

# Principle

- Consider motion tracking by differentiation.
- Differentiation is based on infinitesimaly small changes. In practice it may work for motion up to 2-3 pixels/frame.
- Advantage: can track subpixel movement.
- Problems for
- fast movement
- high resolution
- widely different camera angles

- For widely different camera angles we need to look elsewhere, but for fast movement and high resolution, we can use
*multiscale tracking* **Example**Suppose the movement is 8 pixels/frame, clearly too much.- Downsample the image by a factor of ¼.
- I.e. reduce the resolution by a quarter.
- What is the movement rate now?

# Multiscale Pyramid

- Two input frames \(I_1\) and \(I_2\).
- Write \(I_i^1=I_i\).
- Form \(I^k_i\) by downsampling \(I^{k-1}_i\) by a factor of ½.
- Two to four scales is normally sufficient
**Example**Starting with frames at \(800\times640\), we form pyramids with versions with resolutions \(800\times640\), \(200\times160\), \(400\times320\), and \(100\times80\).- Smoothing is used at each downsampling step

# Procedure

## Coarsest level

- Start with the coarsest level, that is the largest value of \(k\).
- Compare \(I_1^k\) and \(I_2^k\) to calculate the motion \(d^k\) as usual.
- Maintain the running total estimated motion \(d\leftarrow d^k\).

## Subsequent levels

- Consider level \(k\). We have estimated the motion in level \(k+1\) as \(d\). This corresponds to a motion of \(2d\) in level \(k\).
- Comparing \(I_1^k\) and \(I_2^k\) may not work, because the movement is too large. Therefore we adjust one iage to compensate for known motion.
- Compute \(\tilde I_2^k\) by shifting \(I_2^k\) \(2d\) pixels, i.e. \(\tilde I_2^k=I_2^k(x+2d)\).
- Using \(I_1^k\) and \(\tilde I_2^k\) we estimate the motion \(d^k\) as before.
- The total estimated motion is updated \(d\leftarrow 2d+d^k\). That is, we add the motion estimated to the motion removed.

Beware the sign of the movement. Ma (Eq 11.2) computes movement from image 2 to image 1, therefore we have a plus \(x+2d\) when we calculate \(\tilde I\). It is important to check that we have the movement in the right direction.

## End of stage 1

- When we have completed level \(k=1\) we have the total estimated motion as \(d=d^1+2d^2+\cdots+2^{k-1}d^k\).

## Stage 2. Fine tuning

- Fine tuning is done at the finest scale (\(I^1_1,I^1_2\)).
- Make the image \(\tilde I_2=I_2(x+d)\).
- If \(d\) is correctly estimated, these two images should be equal.
- They probably are not.
- Estimate the estimation error \(\Delta d\) as movement for the pair \(I_1,\hat I_2\).
- Adjust the estimate \(d\gets d+\Delta d\).
- Repeat from Step 5 until the correction term is small enough (or you it diverges and you conclude that this does not work).

# Closing remark

There are several lessons to take home here.

- Obviously, you may want to do multiscale motion tracking.
- Resolution matters. You may have to decrease resolution to make a machine vision system work.
- Multiscale is a common idea which can be found in other areas of signal processing.
- The iterative fine tuning that we do is a common principle in numerical analysis (computational science) and is worth being aware of.