Revision 7ec178c4a6edf56a5f24f9c77b02734929d60bfa (click the page title to view the current version)

Image Formation

Changes from 7ec178c4a6edf56a5f24f9c77b02734929d60bfa to current

---
title: Image Formation
categories: session
---

> Vision is the inverse problem of image formation


- perspective

# Briefing


## The Eye Model

![Eye Model from *Introduction to Psychology* by University of Minnesota](Images/eye.jpg)

+ An image of the real world is projected on the Retina.
+ Modern Cameras (more or less) replicate the Eye Model.

## Image Representation 

![The Mandrill Test Image (public domain)](Images/mandrill.png)

![Grey Scale version of Mandrill (public domain)](Images/mandrill-grey.png)

The Retina, or image sensor, is able to sense the projected rays.

A fine grid of sensors or perceptive cells is able to measure, or sample,
the light intensity falling upon it.

Let's have a look at the resulting data, using the popular
mandrill image shown to the right.

```python
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt

im = cv.imread("mandrill-grey.png")
```

Now, we have loaded the image as the object `im`.
Firstly, we observe that this is a matrix.

```
Out[55]:
array([[[ 76,  76,  76],
        [ 60,  60,  60],
        [ 68,  68,  68],
        ...,
        [ 90,  90,  90],
        [ 97,  97,  97],
        [ 96,  96,  96]],

       [[ 83,  83,  83],
        [ 72,  72,  72],
        [ 77,  77,  77],
        ...,
        [ 99,  99,  99],
        [ 87,  87,  87],
        [106, 106, 106]],

       [[ 51,  51,  51],
        [ 75,  75,  75],
        [117, 117, 117],
        ...,
        [ 99,  99,  99],
        [ 81,  81,  81],
        [ 88,  88,  88]],

       ...,

       [[139, 139, 139],
        [140, 140, 140],
        [136, 136, 136],
        ...,
        [ 92,  92,  92],
        [ 96,  96,  96],
        [ 78,  78,  78]],

       [[131, 131, 131],
        [144, 144, 144],
        [138, 138, 138],
        ...,
        [ 85,  85,  85],
        [ 98,  98,  98],
        [ 90,  90,  90]],

       [[109, 109, 109],
        [102, 102, 102],
        [109, 109, 109],
        ...,
        [ 57,  57,  57],
        [ 67,  67,  67],
        [ 69,  69,  69]]], dtype=uint8)

In [56]: im.shape
Out[56]: (128, 128, 3)
```

A little confusingly, this matrix has three dimensions, as if it
were a colour (RGB) image.  Since it is grey scale, we only need
one $128\times128$ matrix.  As we see above, the three values in each
tripple are equal so we can take one arbitrary plane from the matrix.

```
Out[58]:
array([[ 76,  60,  68, ...,  90,  97,  96],
       [ 83,  72,  77, ...,  99,  87, 106],
       [ 51,  75, 117, ...,  99,  81,  88],
       ...,
       [139, 140, 136, ...,  92,  96,  78],
       [131, 144, 138, ...,  85,  98,  90],
       [109, 102, 109, ...,  57,  67,  69]], dtype=uint8)

In [59]:
```

This is the **first representation** of a grey scale image, as
an $n\times m$ matrix.

OpenCV provides the functions to show the image as an image.
This is the **second representation** of the image.
This session is prepared for **self-study**.
However, the room is available, and you should meet up
and take advantage of collaboration.

```
In [60]: cv.imshow("mandrill grey", im)
Use the **first hour** to watch and listen to the lectures
(videos and slideshows with audio).  See under [Briefing](#briefing).
Use the **rest of the time** to solve the 
[Exercises](#exercises) as group work.
You should discuss possible solutions between yourselves **before**
you review the [Solutions](Solutions/Image Formation).
Remember that communicating and arguing solutions to your peers is
one of the most important learning outcomes of the module.

In [61]: cv.waitKey(1)
Out[61]: -1
```
# Briefing {#briefing}

![3D surface plot of the image signal](Images/mandrill-plot.svg)
The actual briefing will extensively use blackboard drawings and
improvisation.  Hence the lecture notes below are **not complete**.

The matrix can also be read as a signal, sampling
values $I(x,y)$ for different values of $x$ and $y$.
This gives the **third representation**, as a 3D 
surface plot.
+ [Image Formation Lecture]() 
+ Rudimentary notes from 2022: [Image Formation Notes]() and
  [Slides](http://www.hg.schaathun.net/talks/maskinsyn/camera.html)

```
In [7]: plt.ion()
Out[7]: <matplotlib.pyplot._IonContext at 0x7fb5737006d0>
## Learning Outcomes

In [8]: fig, ax = plt.subplots(subplot_kw={"projection": "3d"})
During this session, the goal is to learn to master the following
concepts and models:

In [9]: xn,yn = im.shape
+ The image as a sampled function
+ Projection from 3D to 2D, as it occurs in a camera
+ thin lens equation
+ vanishing point
+ The thin lens model
+ aperture, focus
+ The pinhole model

In [10]: xn,yn
Out[10]: (128, 128)
# Exercises {#exercises}

In [11]: X,Y=np.meshgrid(range(xn),range(yn))
Exercises are from Ma 2004 page 62ff.

In [12]: X
Out[12]:
array([[  0,   1,   2, ..., 125, 126, 127],
       [  0,   1,   2, ..., 125, 126, 127],
       [  0,   1,   2, ..., 125, 126, 127],
       ...,
       [  0,   1,   2, ..., 125, 126, 127],
       [  0,   1,   2, ..., 125, 126, 127],
       [  0,   1,   2, ..., 125, 126, 127]])
I recommend to discuss the following problems in small groups.
Use figures and diagrams as basis for your discussion where possible.

In [13]: Y
Out[13]:
array([[  0,   0,   0, ...,   0,   0,   0],
       [  1,   1,   1, ...,   1,   1,   1],
       [  2,   2,   2, ...,   2,   2,   2],
       ...,
       [125, 125, 125, ..., 125, 125, 125],
       [126, 126, 126, ..., 126, 126, 126],
       [127, 127, 127, ..., 127, 127, 127]])
If you prefer, you may consult the 
[Solutions](Solutions/Image Formation) after each individual exercise.

In [14]: ax.plot_surface(X,Y,im)
Out[14]: <mpl_toolkits.mplot3d.art3d.Poly3DCollection at 0x7fb52cd8ad00>
```
## Equivalence of Points (Based on Exercise 3.1.)

Observant readers may notice that the plot is upside down compared
to the image.  This is because, conventionally, $(0,0)$ is the
top left hand pixel, while a plot would usually place $(0,0)$ at the lower
left hand side.
> Show that any point on the line through $o$ (optical centre) and
> $p$ projects onto the same image co-ordinates as $p$.

### Comments
1.  Start by drawing the lens, image, the points $p$ and $o$,
    and the image point.
2.  What does the drawing tell you about the problem?
    Add details to the drawing as required.
3.  Recall the equations which relate the $(x,y)$ co-ordinates of the
    image point to the $(X,Y,Z)$ co-ordinates of $p$.
    (Write it down.)
4.  Consider a different point $p'$ on the same line, and add it
    to your drawing.  Where is its image point?
5.  How does do the co-ordinates $(X',Y',Z')$ of $p'$ relate to $(X,Y,Z)$ and $(x,y)$?
5.  From the above, you should have two arguments solving the
    problem, one geometric and one algebraic.
    Each deserves attention.
    Are these arguments convincing?
    Complete any details as required.
6.  Reflect on the relationship between the algebraic and the
    geometric argument.

While coordinates in the real world are real (continuous) numbers,
images are always *sampled* at a finite number of points or pixels.
In digital photography, this is because the image sensor is a grid
of individual pixel sensors.
It is also true for photographic film, which is composed of light-sensitive
silver halide crystals.
These crystals are large enough to make a visibly coarse structure when 
the image is enlargerd.
It is even true for the human eye which has a finite number of light-sensitive
cells, although in this case, we cannot (usually) notice the finiteness.
##  (Exercise 3.2)

Admittedly, the grid structure of the sensor/film/eye is not a regular
retangular grid.  Each pixel in the raw data from a digital sensor 
usually includes only one colour (red, green, or blue), so that the
different colour bands are not sampled at exactly the same position.
There is some in-camera post-processing which gives the pixmap structure
that we know, with three colours per pixel, in a rectangular grid.
However, this is beyond the scope of this module, and we can safely
ignore it.
> Consider a thin lens imaging a plane parallel to the lens at a distance
> $z$ from the focal plane.
> Determine the region of this plane that contributes to the image $I$
> at the point $x$.
> (Hint: consider first a one-dimensional imaging model, then extend to a
> two-dimensional image.)

## Thin Lens Model
**Note** 
The question makes sense if you assume that the
plane is out of focus, which is not possible in the pinhole model but
is in a more generic thin lens model.

### The Focus Point
1. Always start by making a drawing of the model.
2. Add all concepts mentioned in the problem text to the figure
   (as far as possible).
3. Add any additional concepts that you find important.
4. Identify the concept in question, that is the region contributing
   to the point $x$ in this case.

![Burning Glass from *College Physics* by OpenStax College](Images/sun.jpg)
## Scale Ambiguity  (Exercise 3.8).

![Diagram of Lens Focus Point from *College Physics* by OpenStax College](Images/lens.jpg)
> It is common sense that with a perspective camera, one cannot
> tell an object from another object that is exactly twice
> as big but twice as far.
> This is a classic ambiguity introduced by the perspective projection.
> Use the ideal camera model to explain why this is true.
> Is the same also true for the orthographic projection? Explain.

+ A convex lens collect, or focus, parallel rays into a single focus point.
+ This works as a burning glass.
    - The sun is so far away that the sun rays are parallel for all 
      practical purposes.
+ Definitions
    - **Optical Axis** is the line perpendicular on the lens,
      through its centre.
    - The **Focus** is a point on the Optical Axis.  Rays which enter the
      lense parallel to the optical axis are deflected so that they intersect
      at the Focus.
    - **Focal Length** is the distance between the lens and the Focus.
      (We ignore the thickness of the lens.)
    - The **Focal Plane** is a plane through the Focus, perpendicular
      on the Optical Axis.
1. You can start with the problem you drew above for Exercise 1 (Ma:3.1).
   Consider an object extending between two points $p_1$ and $p_2$ in a
   plane parallel to the lens.  Draw this situation.
2. Imagine that both points move on a line through the optical centre $o$,
   as you did in Exercise 1.  What happens to the image?
   What happens to the object extending between $p_1$ and $p_2$?
3. Write up an argument based on the above reflections.

### The Image Plane
## Field of View (based on Exercise 3.3 Part 1)

+ The image plane
    - non-parallel rays
+ The thin lens equation
+ Points further away
+ The aperture
> How can describe the area (in 3D) observed by a camera?

## The pinhole model
Consider a camera with focal length 24 mm, and a retinal plane
(CCD array) (16 mm x 12 mm).

+ Reference frame
+ Co-ordinates
1.  As always, start with a drawing.  Draw the pinhole module.
    Consider only the $x$-direction where the sensor is 16 mm.
    (You can do the $y$-direction (12mm) afterwards.)
2.  Write the known lengths into the figure.
2.  Where are the points which are observable to the camera?
    Reflect on the question.
3.  You should find that the observable points fall betweeen two
    lines through the focus (pinhole).  Calculate the angle
    $\theta$ between these two lines.
    + Note that the optical axis, through the focus, is orthogonal
      and centred on the sensor array.  It may be easier to calculate
      the angle $\theta/2$ between the optical axis and one of the edge 
      lines.
4.  The angle $\theta$ is known as the field of view (FoV). 
    Once you have calculated FoV for the specific camera,
    give an expression of FoV as a function of the focal length $f$
    and the radius of the sensor $r$.

## Geometry of Image Formation
## Real World and Imavge Co-ordinates (based on Exercise 3.3 Part 2)

# Exercises
> Given a point $(X,Y,Z)$ in 3D, what is the co-ordinates $(x,y)$ of
  the image point?

# Debrief
Consider the same camera system and model as you used in the previous 
exercise.
Consider first a point with co-ordinates $(X,Y,Z)=(6m,4m,8m)$.

# Credits
1.  Draw first the pinhole model in the $x$-direction and find the
    $x$-co-ordinate corresponding to $X=6m$.
1.  Then draw the $xy$-direction and find the
    $y$-co-ordinate corresponding to $X=4m$.

[Introduction to Psychology](https://open.lib.umn.edu/intropsyc)
by University of Minnesota is licensed under a
[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-nc-sa/4.0/), except where otherwise noted.
# Debrief {#debrief}

[*College Physics*](http://cnx.org/contents/031da8d3-b525-429c-80cf-6c8ed997733a/College_Physics).
Authored by: OpenStax College.
License: 
[CC BY: Attribution](https://creativecommons.org/licenses/by/4.0/).
License Terms: Located at License
See  [Solutions](Solutions/Image Formation)
<!--
1.  Questions and Answers
2.  Recap as required
-->