Revision 7fd81549d7507894b92c8f1a3a3c8dacaa802b8c (click the page title to view the current version)

Introductory Session to Machine Learing


  • Ma 2004 Chapter 1.


  1. Briefing Overview and History
  2. Install and Test Software
    • Simple tutorials
  3. Debrief questions and answers
    • recap of linear algebra

1 Briefing

Practical Information


  • Wiki - living document - course content
  • BlackBoard - announcements - discussion fora
  • Questions - either
    • in class
    • in discussion fora
  • Email will only be answered when there are good reasons not to use public fora.

Taught Format

  • Sessions 4h twice a week
    • normally 1h briefing + 2h exercise + 1h debrief (may vary)
  • Exercises vary from session to session
    • mathematical exercises
    • experimental exercises
    • implementational exercises
  • No Compulsory Exercises
  • Feedback in class
    • please ask for feedback on partial work
  • Keep a diary. Make sure you can refer back to previous partial solution and reuse them.

Learning Outcomes

  • Knowledge
    • The candidate can explain fundamental mathematical models for digital imaging, 3D models, and machine vision
    • The candidate are aware of the principles of digital cameras and image capture
  • Skills
    • The candidate can implemented selected techniques for object recognition and tracking
    • The candidate can calibrate cameras for use in machine vision systems
  • General competence
    • The candidate has a good analytic understadning of machine vision and of the collaboration between machine vision and other systems in robotics
    • The candidate can exploit the connection between theory and application for presenting and discussing engineering problems and solutions


  • Oral exam \(\sim 20\) min.
  • First seven minutes are yours
    • make a case for your grade wrt. learning outcomes
    • your own implementations may be part of the case
    • essentially that you can explain the implementation analytically
  • The remaing 13-14 minutes is for the examiner to explore further
  • More detailed assessment criteria will be published later


Eye Model from Introduction to Psychology by University of Minnesota
Eye Model from Introduction to Psychology by University of Minnesota
  • Vision is a 2D image on the retina
    • Each cell perceives the light intencity of colour of the light projected thereon
  • Easily replicated by a digital camera
    • Each pixel is light intencity sampled at a given point on the image plane


1912 International Lawn Tennis Challenge
1912 International Lawn Tennis Challenge
  • Human beings see 3D objects
    • not pixels of light intencity
  • We recognise objects - cognitive schemata
    • we see a ball - not a round patch of white
    • we remember a tennis match - more than four people with white clothes and rackets
  • We observe objects arranged in depth
    • in front of and behind the net
    • even though they are all patterns in the same image plane
  • 3D reconstruction from 2D retina image
    • and we do not even think about how


  • Artificial systems interact with their surroundings
    • navigate in a 3D environment
  • Simpler applications
    • face recognition
    • tracking in surveillance cameras
    • medical image diagnostics (classification)
    • image retrieval (topics in a database)
    • detecting faulty products on a conveyor belt (classification)
    • aligning products on a conveyor belt
  • Other advances in AI creates new demands on vision
    • 20 years ago, walking was a major challenge for robots
    • now robots walk, and they need to see where they go …


  • Artificial systems interact with their surroundings
    • navigate in a 3D environment
  • This means
    • Geometry of multiple views
    • Relationship between theory and practice
    • … between analysis and implementation
  • Mathematical approach
    • inverse problem; 3D to 2D is easy, the inverse is hard
    • we need to understand the geometry to know what we program


  • 1435: Della Pictura - first general treatise on perspective
  • 1648 Girard Desargues - projective geometry
  • 1913 Kruppa: two views of five points suffice to find
    • relative transformation
    • 3D location of the points
    • (up to a finite number of solutions)
  • mid 1970s: first algorithms for 3D reconstruction
  • 1981 Longuet-Higgins: linear algorithm for structure and motion
  • late 1970s E. D. Dickmans starts work on vision-based autonomous cars
    • 1984 small truck at 90 km/h on empty roads
    • 1994: 180 km/h, passing slower cars


  • Demos and tutorials in Python
    • you can use whatever language you want
    • we avoid Jupyter to make sure we can use camera and interactive displays easily
  • Demos and help on Unix-like system (may or may not include Mac OS)
  • In the exercise sessions
    • install necessary software
    • use the tutorials to see that things work as expected
  • In the debrief, we will start briefly on the mathematical modelling

2 Tutorials

3 Debrief

  1. Discuss problems arising from the practical session.
  2. Repeat basic linear algebra (below).
  3. Possibly start on 3D Mathematics - probably not though.

Vectors and Points

  • A point in space \(\mathbf{X} = [X_1,X_2,X_3]^\mathrm{T}\in\mathbb{R}^3\)
  • A bound vector, from \(\mathbf{X}\) to \(\mathbf{Y}\): \(\vect{\mathbf{XY}}\)
  • A free vector is the same difference, but without any specific anchor point
    • represented as \(\mathbf{Y} - \mathbf{X}\)
  • Set of free vectors form a linear vector space
    • note points do not
    • The sum of two vectors is another vector
    • The sum of two points is not a point

Dot product (inner product)

\[x=\begin{bmatrix}x_1\\x_2\\x_3\end{bmatrix}\quad y=\begin{bmatrix}y_1\\y_2\\y_3\end{bmatrix}\]

Inner product \[\langle x,y\rangle = x^\mathrm{T}y = x_1y_1+x_2y_2+x_3y_3\]

Euclidean Norm \[||x|| = \sqrt{\langle x,x\rangle}\]

Orthogonal vectors when \(\langle x,y\rangle=0\)

Cross product

\[x\times y = \begin{bmatrix} x_2y_3 - x_3y_2 \\ x_3y_1 - x_1y_3 \\ x_1y_2 - x_2y_1 \end{bmatrix} \in \mathbb{R}^3\]

Observe that

  • \(y\times x = -x\times y\)
  • \(\langle x\times y, y\rangle= \langle x\times y, x\rangle\)

\[x\times y = \hat xy \quad\text{where}\quad \hat x = \begin{bmatrix} 0 -x_3 x_2 \\ x_3 0 -x_1 \\ -x_2 x_1 0 \end{bmatrix} \in \mathbb{R}^{3\times3}\]

\(\hat x\) is a skew-symmetric matrix because \(\hat x=-\hat x^\mathrm{T}\)

Right Hand Rule


Skew-Symmetric Matrix


Change of Basis