Revision a4b2c99b7d0706d475a0cd2598e6a5dca3c96247 (click the page title to view the current version)

Eight-point algorithm

Changes from a4b2c99b7d0706d475a0cd2598e6a5dca3c96247 to e90182a26ca9e234c290e8284004cc50b428f127

---
title: Eight-point algorithm
categories: session
---

**Briefing** [Eight-point algorithm Lecture]()

**Additional Reading**
Chapter 9 in  *OpenCV 3 Computer Vision with Python Cookbook* by
Alexey Spizhevoy (author).
Search for it in [Oria](https://oria.no/).
There is an e-book available.

# Exercises
**Learning Outcome** 
Understand how the eight-point algorithm works on given points.

# Exercise

In this exercise we will implement and test the eight-point
algorithm using a synthetic dataset.  In a real world application
you will have to identify and match features in the image, and
be prepared for mismatches which require a lot of error-handling.
This would be too much for one exercise.

**Note.**
You can choose, in this exercise, how much you want to 
compute by hand, and what you want to implement in python.

Part 1 makes the synthetic data set and is not part
of the solutions for the reconstruction problem.  
Part 2 solves the reconstruction problem.
Because you use a synthetic dataset, Part 1 provides you with
ground truth, which can be used to validate the solutions
in Steps 6-7.
Such use of synthetic data is important to test solutions also
in real life development work.

## Part 1.  Making the synthetic dataset.


### Step 1-1.

Select eight points in general position in 3D.  For instance
$$(-20,0,25),
(20,0,25),
(0,20,25),
(-10,10,50),
(10,10,50),
(0,-10,50),
(0,0,60),
(-7,7,70),
$$
or generate some at random if you prefer.

**Note**  
The exercise has been updated to
move the points further from the camera (increasing their
$z$-coordinates).
It seems that at least one point in the first version fell behind
the image plane in the rotated camera frame.

### Step 1-2.

Consider a camera posed in the world frame.  
I.e. the selected points have co-ordinates in the camera frame.
Assume the image plane has co-ordinates $z=1$.

+ Find the image point for each world point, using an ideal camera projection.

### Step 1-3.

Consider a second camere which is translated ten units along
the $x$-axis and rotated $30^\circ$ degrees around the $y$-axis.
Again the image plane is $z=1$.

+ Find the transformation (relative pose) $(R,T)$.
+ Find (3D) co-ordinates in the second camera frame for each world point.
+ Find the image points in the second camera frame for each world point.

## Part 2.  The Eight-Point Algorithm

### Step 2-1.

Imagine now that we do not know the world points.
We are going to try to reconstruct them using the 
[Eight-point algorithm]().

1. Construct the matrix $\chi$ with the Kronecker products of
   the eight image pairs.
2.  Compute the singular value decomposition of 
    $\chi=U_\chi\Sigma_\chi V_\chi^T$
    (probably using `numpy.linalg.svd`)
    - Inspect the three components.  What do you see?
    - The middle component should be a diagonal matrix containing the
      singular values.  How are they ordered?
4.  According to Ma (2004:121) the serialised fundamental matrix $E^s$
    is the ninth column of $V_\chi$.
    - this should be wrig
5.  Unserialise $E^s$ to get the preliminary matrix $E$.
    - you can do this with the numpy `reshape` function, but test it.
      You may have to transpose the resulting matrix.
  
### Step 2-2.  Projection onto the Essential Space

If $E=U\Sigma V^T$ is our preliminary  matrix as found above,
we approximate the essential
matrix by replacing $\Sigma$ with $\mathsf{diag}(1,1,0)$
(cf. Ma (2004:121)).

### Step 2-3.  Rotation and Translation

Recover Translation and Rotation matrices from the essential matrix,
relying on Ma (2004:121).

+ What do the two solutions represent?
+ Do they match the actual transformation?

**Remember** there are four solutions for $(R,T)$, three of
which produce objects point behind either or both cameras.
Remember to check both $E$ and $-E$, in addition to the two
solutions for $(R,T)$ from one choice for $E$.

**Note**
If the transformation you find seem off, typically with errors
of sign, there are at least two plausible explanations.

1.  You have not checked all four solutions (see above).
2.  Some object point from Step 1 actually does end up behind
    the image plane in the second camera frame.  If this happens,
    you should try to choose different object points.

### Step 2-4.  Reconstruction in 3D

Consider one pair of image points together with the transformation 
$(R,T)$.

+ Calculating the $z$-coordinate $\lambda$
  of the 3D point as we did in [Relative Pose]().
  You know that the distance between the camera origins is $||T||=10$.
+ Invert the projection to recover the complete coordinates of the
  3D point.
+ Does the recovered point match the original point?
+ If you have time, repeat with a second and possibly a third point.
  You probably do not want to do all eight, though.

# Debrief


# Exercise using real images.

In this exercise, we shall try to determine the relative pose
of two cameras, using the eight-point algorithm (or a variant thereof).

## Step 1.  Make a Data Set

1.  Take two images of the same scene, using different camera poses.
    - the difference between the poses should be significant,
      but small enough to recognise the same feature points.
    - i.e. two consecutive frames from a video will probably be too similar.
2.  Run the Harris Detector on both images, and identify at least
    eight features which you can pair between the images.
    - if you do not find eight, you need to use more similar poses.

**Note 1**
It may be useful to calibrate the camera(s) and undistort the
images before starting.  It is ok to try without calibration first,
for the sake of simplicity.

**Note 2**
you should pair the feature points manually in this exercise, to make
sure that no mismatches ruin your results. 
When you have the first prototype working, you can try to pair feature
points programmatically, using SIFT or other methods to match features.

## Step 2.  Essential matrix

Each pair $(\mathbf{x}_i,\mathbf{x}_i')$ of corresponding features
from the two images give rise to a Kronecker product.
$$\mathbf{a}_i=\mathbf{x}_i\otimes\mathbf{x}_i'$$
Using these vectors as columns, we get the $8\times n$ matrix $\chi$,
where $n$ is the number of feature pairs.

1.  Calculate $\chi$ from your dataset.
2.  Compute the singular value decomposition of 
    $\chi=U_\chi\Sigma_\chi V_\chi^T$
    using `numpy.linalg.svd`
    - Inspect the three components.  What do you see?
    - The middle component should be a diagonal matrix containing the
      singular values.  How are they ordered?
4.  According to Ma (2004:121) the serialised fundamental matrix $E^s$
    is the ninth column of $V_\chi$.
    - this should be wrig
5.  Unserialise $E^s$ to get the fundamental matrix $E$.
    - you can do this with the numpy `reshape` function, but test it.
      You may have to transpose the resulting matrix.
  
## Step 3.  Projection onto the Essential Space

If $E=U\Sigma V^T$ is the fundamental matrix, we get the essential
matrix by replacing $\Sigma$ with $\mathsf{diag}(1,1,0)$
(cf. Ma (2004:121)).

## Step 4.  Rotation and Translation

Recover Translation and Rotation matrices from the essential matrix,
relying on Ma (2004:121).

What do the two solutions represent?

Does it seem reasonable compared to how you moved the cameras?

# Debrief