# Evaluation and Statistics

**Reading** Szeliski (2022): Computer Vision: Algorithms and Applications, 2nd ed. Chapter 5.1-5.2

# Exercises

## Exercise 1

Review the machine learning systems you studied last week. Calculate the confusion matrix for each of the systems.

- Are the errors reasonably balanced?
- Are any classes particularly difficult to detect?

## Exercise 2

Reviewing again each of the systems from last week.

- Calculate the false positive and false negative rates.
- Estimate the standard deviation of the error rates.
- Assess the quality of the machine learning system. Can you be confident that the error probability is satisfactory?
- How large does the test set have to be to make a confident assessment?

See also exercises on Regression intended for tomorrow.

# Briefing

## Recap

*What did we learn last week?*

- Supervised learning
- Loss function – Cross-Entropy
- \(E(\mathbf{w}) = -\sum_n p_{n,t_n}\)
- \(p_{n,k}\) is the networks estimated probability that object \(n\) has the class \(k\)
- \(t_n\) is the true class of \(t_n\)

## Regression problem - Gravitational Lensing

- Lensing model
- Source - size \(\sigma\) and position \((x,y)\)
- Lens - Einstein radius \(R_E\)
- Distorted Image

- Four parameters determine the distorted image.
- Can we recover these four parameters from the image?

- Instead of a discrete class, we want the network to predict \(\sigma,x,y,R_E\)
- Loss function is Mean Squared Error
- \(\mathsf{MSE} = \frac14(\sigma-\hat\sigma)^2+(x-\hat x)^2 +(y - \hat y)^2+(R_E-\hat R_E)^2\)
- Note, normalisation can vary.
- The starting point is the sum of squared errors (SSE).
- We may or may not normalise by dividing by the number of data points and or prediction parameters.

## Evaluation

- Confusion Matrix
- false positive, false negative

- Accuracy

\[ \mathsf{Acc} = \frac{ \mathsf{TP} + \mathsf{TN} }{ \mathsf{TP}+\mathsf{FN} + \mathsf{TN}+\mathsf{FP} }\] + Warning. Biased datasets + Other heuristics

\[ F_1 = \frac{ \frac{\mathsf{TP}}{\mathsf{TP}+\mathsf{FP}} \cdot \frac{\mathsf{TP}}{\mathsf{TP}+\mathsf{FN}} }{ \frac{\mathsf{TP}}{\mathsf{TP}+\mathsf{FP}} + \frac{\mathsf{TP}}{\mathsf{TP}+\mathsf{FN}} }\]

- TP/TN/FP/FN are Stochastic Variables
- Binomially Distributed

- Regression: absolute or squared error
- Also a Stochastic Variable (depends on the data drawn randomly from the population)
- Mean over a large dataset gives a reasonable estimator
- Standard Deviation can be estimated using the sample standard deviation

- Important: Each item in the test set makes one experiment/observation. This allows statistical

## Overtraining and Undertraining

- Exercise last week.
- I have not been able to generate the expected result.
- The deep networks tested produce impressive results with very little training.

- Still, important principle.
- The training data
- contain a limited amount of information about the population
- have some peculiar quirks

- The network has a certain number of DOF (weights) which can be adjusted to store information extracted from the training set.
- Undertraining means insufficient training to absorb the relevant information
- insufficient epochs
- insufficient training set (relative to DOF)

- A large network with many DOF, can learn a small dataset completely.
- Overtraining means that the network has learnt more than what generalises

- Sometimes, regularisation techniques are used to remove (zero out) DOF with little impact
- Occam’s Razor

## Normalisation

- Network layers add up coefficients from different sources
- Large numbers contribute a lot.
- Numbers with a small range contribute little.

- Images are simple. One range for all pixels.
- Some datasets combine data with different ranges.
- If some have range \(\pm1\), some have range \((0,10^{-5})\) and others \((0,10^5)\), then small numbers are negligible and are effectively ignored.

- Scaling is standard procedure.
- Scale each column training data to \((0,1)\) (or \(\pm1\)).
- Store the scaling function and apply it to the test set and all future data when making predictions.

- This also applies to weights in the network.
- Weights should be balanced between layers.
- Batch Normalisation.

- Too many normalisation and regularisation techniques to learn all before starting.
- New techniques keep emerging.
- Gain some experience and return to extend your repertoire.

# Debrief

- Demo: python module/script