Lecture - Real World Reconstruction
Reading Ma 2004:Ch 11
Overview
Feature Detection
- Harris Detector
- Tiling
- divide the image into, say, \(10\times10\) tiles
- select features per tile
- Separation
- one feature may cause several pixels to be marked as a corner
- separation should be larger than the windows size used in detection
- Sorting by strength
- sort first, and then select from top of the list
- enforce separation from previously selected features
Feature Correspondence
- Small Baseline (motion video)
- feature tracking - calculate motion
- Moderate Baseline (snapshots)
- Wide Baseline -> use SIFT or similar methods
- the textbook is outdated on this point
Basic Tracker
- Recall the use of the gradient
- Temporal derivative \(I_t\) approximated by difference \(I^2-I^1\)
- Displacement over 2–3 pixels \(\to\) first-order differences do not suffice
- Therefore, we use a multiscale approach
- Successively smoothen and downsample
- Tracking in coarser scale works for larger displacement (more displacement per pixel)
Multiscale iterative feature tracking
- Track in the coarsest scale first.
- Shift the image according to the displacement.
- Repeat the tracking in the next scale, and repeat for every scale.
- Add together the displacement, correcting for the downsampling factor.
- Two to four scales typically suffice, but this may depend on the original resolution and frame rate
- textbook is old, and more modern standards may increase requirements
- Refinement
- Iteration in the finest scale
- Use warped/inerpolated version of the next frame
- Successively improve the estimate
- Subpixel accuracy
- Algorithm 11.2
- Caveat: Drift. Propagation of tracking error.
- Compensate by feature matching
Projective Reconstruction
Calibration
- Intrinsic
- Extrinsic
- Non-linear
Note The calibration tutorial focused on non-linear calibration. This is separate from the rest of the system, and unrelated to all the other calibrations and transformations discussed in the module.
Projective Reconstruction (Alg 11.6)
If we have the intrinsic camera matrix, we can do a Euclidean reconstruction straight away.
If not, the known algorithms only provide a projective reconstruction.
- Eight-point algorithm to find \(F\)
- Recover \([R,T]\) from \(F\).
Euclidean Reconstruction
Instead of doing a complete, stratified reconstruction, it is worth using the last week of the semester to try out the OpenCV API, assuming that the cameras are available for calibration.