Revision 52c7aeb23b5de35bdd643ee1e487ca4329ec5928 (click the page title to view the current version)

Feature Descriptors

Briefing

Challenge Authors use many words without definition. They are assumed to be known, which is true within a discipline but not when people cross over between computing. signal processing, automation, mathemaics, and other disciplines. Therefore it took me some time to triangulate several sources and pin down some core concepts. The briefint will focus on these.

Feature Matching

In feature tracking, we use the time derivative to track each feature point once detected.

In feature matching, we compare images which are not necessarily related on the time axis. In other words, we need to compare features directly, and pair corresponding features in two independent images.

To identify features, we use feature descriptors. One of the most popular descriptors is called SIFT - Scale Invariant Feature Transform.

Unfortunately, this is not described in the textbook. For details, one can refer to Szeliski (2022) textbook, Chapter

The principle used by SIFT is to gather statistics from the neighbourhood around the feature point, forming a vector which is used as an identifier.

When we move to Chapter 5 of Ma’s, we will use feature matching to triangulate points in 3D, and determine there depth, that is their distance from the cameras.

Gaussian Scale Space

  1. The image is a 2D matrix/signal \(I(x,y)\).
  2. When we smoothen this by a Gaussian, we take a Gaussian function \(G(x,y,\sigma)\) for some standard deviation \(\sigma\), and calculate the convolution \[L(x,y,\sigma)=G(x,y,\sigma)*I(x,y)\]
  3. We can read this a 3D signal/tensor, i.e. we have a space in \(x\), \(y\), and \(\sigma\) co-ordinates.