Multiscale Detection
Reading Ma 2004 Chapter 11.2
Principle
- Consider motion tracking by differentiation.
- Differentiation is based on infinitesimaly small changes. In practice it may work for motion up to 2-3 pixels/frame.
- Advantage: can track subpixel movement.
- Problems for
- fast movement
- high resolution
- widely different camera angles
- For widely different camera angles we need to look elsewhere, but for fast movement and high resolution, we can use multiscale tracking
- Example Suppose the movement is 8 pixels/frame, clearly too much.
- Downsample the image by a factor of ¼.
- I.e. reduce the resolution by a quarter.
- What is the movement rate now?
Multiscale Pyramid
- Two input frames \(I_1\) and \(I_2\).
- Write \(I_i^1=I_i\).
- Form \(I^k_i\) by downsampling \(I^{k-1}_i\) by a factor of ½.
- Two to four scales is normally sufficient
- Example Starting with frames at \(800\times640\), we form pyramids with versions with resolutions \(800\times640\), \(200\times160\), \(400\times320\), and \(100\times80\).
- Smoothing is used at each downsampling step
Procedure
Coarsest level
- Start with the coarsest level, that is the largest value of \(k\).
- Compare \(I_1^k\) and \(I_2^k\) to calculate the motion \(d^k\) as usual.
- Maintain the running total estimated motion \(d\leftarrow d^k\).
Subsequent levels
- Consider level \(k\). We have estimated the motion in level \(k+1\) as \(d\). This corresponds to a motion of \(2d\) in level \(k\).
- Comparing \(I_1^k\) and \(I_2^k\) may not work, because the movement is too large. Therefore we adjust one iage to compensate for known motion.
- Compute \(\tilde I_2^k\) by shifting \(I_2^k\) \(2d\) pixels, i.e. \(\tilde I_2^k=I_2^k(x+2d)\).
- Using \(I_1^k\) and \(\tilde I_2^k\) we estimate the motion \(d^k\) as before.
- The total estimated motion is updated \(d\leftarrow 2d+d^k\). That is, we add the motion estimated to the motion removed.
Beware the sign of the movement. Ma (Eq 11.2) computes movement from image 2 to image 1, therefore we have a plus \(x+2d\) when we calculate \(\tilde I\). It is important to check that we have the movement in the right direction.
End of stage 1
- When we have completed level \(k=1\) we have the total estimated motion as \(d=d^1+2d^2+\cdots+2^{k-1}d^k\).
Stage 2. Fine tuning
- Fine tuning is done at the finest scale (\(I^1_1,I^1_2\)).
- Make the image \(\tilde I_2=I_2(x+d)\).
- If \(d\) is correctly estimated, these two images should be equal.
- They probably are not.
- Estimate the estimation error \(\Delta d\) as movement for the pair \(I_1,\hat I_2\).
- Adjust the estimate \(d\gets d+\Delta d\).
- Repeat from Step 5 until the correction term is small enough (or you it diverges and you conclude that this does not work).
Closing remark
There are several lessons to take home here.
- Obviously, you may want to do multiscale motion tracking.
- Resolution matters. You may have to decrease resolution to make a machine vision system work.
- Multiscale is a common idea which can be found in other areas of signal processing.
- The iterative fine tuning that we do is a common principle in numerical analysis (computational science) and is worth being aware of.