Revision 850063c395012a0142f778b706314f4b23a0ff06 (click the page title to view the current version)

Eight-point algorithm

Exercise
Debrief

Briefing Eight-point algorithm Lecture

Additional Reading Chapter 9 in OpenCV 3 Computer Vision with Python Cookbook by Alexey Spizhevoy (author). Search for it in Oria. There is an e-book available.

Learning Outcome Understand how the eight-point algorithm works on given points.

Exercise

In this exercise we will implement and test the eight-point algorithm using a synthetic dataset. In a real world application you will have to identify and match features in the image, and be prepared for mismatches which require a lot of error-handling. This would be too much for one exercise.

Note. You can choose, in this exercise, how much you want to compute by hand, and what you want to implement in python.

Part 1 makes the synthetic data set and is not part of the solutions for the reconstruction problem.
Part 2 solves the reconstruction problem. Because you use a synthetic dataset, Part 1 provides you with ground truth, which can be used to validate the solutions in Steps 6-7. Such use of synthetic data is important to test solutions also in real life development work.

Part 3 completes the circle by using the triangulation from Relative Pose to recover the 3D points.

Part 1. Making the synthetic dataset.

In this part, you may choose if you want to do the calculations by hand, or implement them in Python. You will only use the linear algebra techniques that we studied early in the semester.

Step 1-1.

Select eight points in general position in 3D. For instance \[(-20,0,25), (20,0,25), (0,20,25), (-10,10,50), (10,10,50), (0,-10,50), (0,0,60), (-7,7,70), \] or generate some at random if you prefer.

Note
The exercise has been updated to move the points further from the camera (increasing their \(z\)-coordinates). It seems that at least one point in the first version fell behind the image plane in the rotated camera frame.

Step 1-2.

Consider a camera posed in the world frame.
I.e. the selected points have co-ordinates in the camera frame. Assume the image plane has co-ordinates \(z=1\).

Find the image point for each world point, using an ideal camera projection.

Step 1-3.

Consider a second camere which is translated ten units along the \(x\)-axis and rotated \(30^\circ\) degrees around the \(y\)-axis. Again the image plane is \(z=1\).

Find the transformation (relative pose) \((R,T)\).
Find (3D) co-ordinates in the second camera frame for each world point.
Find the image points in the second camera frame for each world point.

Step 1-4.

You should now have two image points for each of the 3D points given in Step 1-1. This gives you eight pairs of image points which we are going to use in the following.

Part 2. The Eight-Point Algorithm

Imagine now that we do not know the world points. We are going to try to reconstruct them using the Eight-point algorithm.

Step 2-1.

Each pair \((\mathbf{x}_i,\mathbf{x}_i')\) of corresponding features from the two images give rise to a Kronecker product. \[\mathbf{a}_i=\mathbf{x}_i\otimes\mathbf{x}_i'\] Using these vectors as columns, we get the \(8\times n\) matrix \(\chi\), where \(n\) is the number of feature pairs.

Construct the matrix \(\chi\) with the Kronecker products of the eight image pairs.
Compute the singular value decomposition of \(\chi=U_\chi\Sigma_\chi V_\chi^T\) using numpy.linalg.svd
- Inspect the three components. What do you see?
- The middle component should be a diagonal matrix containing the singular values. How are they ordered?
According to Ma (2004:121) the serialised fundamental matrix \(E^s\) is the ninth column of \(V_\chi\).
- What do you have in the ninth column?
Unserialise \(E^s\) to get the preliminary matrix \(E\).
- you can do this with the numpy reshape function, but test it. You may have to transpose the resulting matrix.

Step 2-2. Projection onto the Essential Space

If \(E=U\Sigma V^T\) is our preliminary matrix as found above, we approximate the essential matrix by replacing \(\Sigma\) with \(\mathsf{diag}(1,1,0)\) (cf. Ma (2004:121)).

Step 2-3. Rotation and Translation

Recover Translation and Rotation matrices from the essential matrix, relying on Ma (2004:121).

What do the two solutions represent?
Do they match the actual transformation which you calculated in Step 1-3?

Remember there are four solutions for \((R,T)\), three of which produce objects point behind either or both cameras. Remember to check both \(E\) and \(-E\), in addition to the two solutions for \((R,T)\) from one choice for \(E\).

Note If the transformation you find seem off, typically with errors of sign, there are at least two plausible explanations.

You have not checked all four solutions (see above).
Some object point from Step 1 actually does end up behind the image plane in the second camera frame. If this happens, you should try to choose different object points.

Part 3. (Optional) Reconstruction in 3D

In this part we use the techniques from Relative Pose to reconstruct the 3D-points (as given in Step 1-1). You have already validated the eight-point algorithm by checking the transformation \((R,T)\), so this step is part of the wider system.

Consider one pair of image points together with the transformation \((R,T)\).

Calculating the \(z\)-coordinate \(\lambda\) of the 3D point as we did in Relative Pose. You know that the distance between the camera origins is \(||T||=10\).
Invert the projection to recover the complete coordinates of the 3D point.
Does the recovered point match the original point?
If you have time, repeat with a second and possibly a third point. You probably do not want to do all eight, though.

Debrief

This demo file (Jupyter Notebook) does not yet work, but may give some ideas, both for the solutions and for testing techniques. It is under construction. Thanks a lot to Modestas for his contributions.