Revision 5aae4d2eccc435a1c6031d0233a10c38b3297f4c (click the page title to view the current version)

Multiscale Detection

Changes from 5aae4d2eccc435a1c6031d0233a10c38b3297f4c to b711f31936b0656e9fe3a22b1bd4b1aec98ea408

---
title: Multiscale Detection
categories: lecture
---

**Reading** Ma 2004 Chapter 11.2

# Principle

1. Consider motion tracking by differentiation.
2. Differentiation is based on infinitesimaly small changes.
   In practice it may work for motion up to 2-3 pixels/frame.
3. Advantage: can track subpixel movement.
4. Problems for
    - fast movement
    - high resolution
    - widely different camera angles
5. For widely different camera angles we need to look elsewhere,
   but for fast movement and high resolution, we can use
   *multiscale tracking*
6. **Example** Suppose the movement is 8 pixels/frame, clearly too much.
    - Downsample the image by a factor of ¼.
    - I.e. reduce the resolution by a quarter.
    - What is the movement rate now?
7. **General Idea**
    - too much data is a common problem;
      you can see it in machine learning as well
    - high resolution emphasises fine-grained detail
    - lower resolution makes coarser structures more evident

# Multiscale Pyramid

1. Two input frames $I_1$ and $I_2$.
2. Write $I_i^1=I_i$.
3. Form $I^k_i$ by downsampling $I^{k-1}_i$ by a factor of ½.
4. Two to four scales is normally sufficient
5. **Example** Starting with frames at $800\times640$,
   we form pyramids with versions with resolutions
   $800\times640$, $200\times160$, $400\times320$, and $100\times80$.
6. Smoothing is used at each downsampling step

# Procedure

## Coarsest level

1. Start with the coarsest level, that is the largest value of $k$.
2. Compare $I_1^k$ and $I_2^k$ to calculate the motion $d^k$ as usual.
3. Maintain the running total estimated motion $d\leftarrow d^k$.

## Subsequent levels

1. Consider level $k$.  We have estimated the motion in level $k+1$
   as $d$.  This corresponds to a motion of $2d$ in level $k$.
2. Comparing $I_1^k$ and $I_2^k$  may not work, because the movement
   is too large.  Therefore we adjust one iage to compensate for
   known motion.
3. Compute $\tilde I_2^k$ by shifting $I_2^k$ $2d$ pixels, i.e.
   $\tilde I_2^k=I_2^k(x+2d)$.
4. Using $I_1^k$ and $\tilde I_2^k$ we estimate the motion $d^k$
   as before.
5. The total estimated motion is updated $d\leftarrow 2d+d^k$.
   That is, we add the motion estimated to the motion removed.

Beware the sign of the movement.  Ma (Eq 11.2) computes
movement from image 2 to image 1, therefore we have a plus
$x+2d$ when we calculate $\tilde I$.  It is important to 
check that we have the movement in the right direction.

## End of stage 1

1. When we have completed level $k=1$ we have the total estimated
   motion as $d=d^1+2d^2+\cdots+2^{k-1}d^k$.

## Stage 2. Fine tuning

1. Fine tuning is done at the finest scale ($I^1_1,I^1_2$).
2. Make the image $\tilde I_2=I_2(x+d)$.
3. If $d$ is correctly estimated, these two images should be equal.
4. They probably are not.
5. Estimate the estimation error $\Delta d$ as movement for the
   pair $I_1,\hat I_2$.
6. Adjust the estimate $d\gets d+\Delta d$.
7. Repeat from Step 5 until the correction term is small enough
   (or you it diverges and you conclude that this does not work).

# Closing remark

There are several lessons to take home here.

1.  Obviously, you may want to do multiscale motion tracking.
2.  Resolution matters.  You may have to decrease resolution
    to make a machine vision system work.
3.  Multiscale is a common idea which can be found in other
    areas of signal processing.
4.  The iterative fine tuning that we do is a common principle
    in numerical analysis (computational science) and is worth
    being aware of.