Evaluation and Statistics
Reading Szeliski (2022): Computer Vision: Algorithms and Applications, 2nd ed. Chapter 5.1-5.2
Exercises
Exercise 1
Review the machine learning systems you studied last week. Calculate the confusion matrix for each of the systems.
- Are the errors reasonably balanced?
- Are any classes particularly difficult to detect?
Exercise 2
Reviewing again each of the systems from last week.
- Calculate the false positive and false negative rates.
- Estimate the standard deviation of the error rates.
- Assess the quality of the machine learning system. Can you be confident that the error probability is satisfactory?
- How large does the test set have to be to make a confident assessment?
See also exercises on Regression intended for tomorrow.
Briefing
Recap
What did we learn last week?
- Supervised learning
- Loss function – Cross-Entropy
- \(E(\mathbf{w}) = -\sum_n p_{n,t_n}\)
- \(p_{n,k}\) is the networks estimated probability that object \(n\) has the class \(k\)
- \(t_n\) is the true class of \(t_n\)
Regression problem - Gravitational Lensing
- Lensing model
- Source - size \(\sigma\) and position \((x,y)\)
- Lens - Einstein radius \(R_E\)
- Distorted Image
- Four parameters determine the distorted image.
- Can we recover these four parameters from the image?
- Instead of a discrete class, we want the network to predict \(\sigma,x,y,R_E\)
- Loss function is Mean Squared Error
- \(\mathsf{MSE} = \frac14(\sigma-\hat\sigma)^2+(x-\hat x)^2 +(y - \hat y)^2+(R_E-\hat R_E)^2\)
- Note, normalisation can vary.
- The starting point is the sum of squared errors (SSE).
- We may or may not normalise by dividing by the number of data points and or prediction parameters.
Evaluation
- Confusion Matrix
- false positive, false negative
- Accuracy
\[ \mathsf{Acc} = \frac{ \mathsf{TP} + \mathsf{TN} }{ \mathsf{TP}+\mathsf{FN} + \mathsf{TN}+\mathsf{FP} }\] + Warning. Biased datasets + Other heuristics
\[ F_1 = \frac{ \frac{\mathsf{TP}}{\mathsf{TP}+\mathsf{FP}} \cdot \frac{\mathsf{TP}}{\mathsf{TP}+\mathsf{FN}} }{ \frac{\mathsf{TP}}{\mathsf{TP}+\mathsf{FP}} + \frac{\mathsf{TP}}{\mathsf{TP}+\mathsf{FN}} }\]
- TP/TN/FP/FN are Stochastic Variables
- Binomially Distributed
- Regression: absolute or squared error
- Also a Stochastic Variable (depends on the data drawn randomly from the population)
- Mean over a large dataset gives a reasonable estimator
- Standard Deviation can be estimated using the sample standard deviation
- Important: Each item in the test set makes one experiment/observation. This allows statistical
Overtraining and Undertraining
- Exercise last week.
- I have not been able to generate the expected result.
- The deep networks tested produce impressive results with very little training.
- Still, important principle.
- The training data
- contain a limited amount of information about the population
- have some peculiar quirks
- The network has a certain number of DOF (weights) which can be adjusted to store information extracted from the training set.
- Undertraining means insufficient training to absorb the relevant information
- insufficient epochs
- insufficient training set (relative to DOF)
- A large network with many DOF, can learn a small dataset completely.
- Overtraining means that the network has learnt more than what generalises
- Sometimes, regularisation techniques are used to remove (zero out) DOF with little impact
- Occam’s Razor
Normalisation
- Network layers add up coefficients from different sources
- Large numbers contribute a lot.
- Numbers with a small range contribute little.
- Images are simple. One range for all pixels.
- Some datasets combine data with different ranges.
- If some have range \(\pm1\), some have range \((0,10^{-5})\) and others \((0,10^5)\), then small numbers are negligible and are effectively ignored.
- Scaling is standard procedure.
- Scale each column training data to \((0,1)\) (or \(\pm1\)).
- Store the scaling function and apply it to the test set and all future data when making predictions.
- This also applies to weights in the network.
- Weights should be balanced between layers.
- Batch Normalisation.
- Too many normalisation and regularisation techniques to learn all before starting.
- New techniques keep emerging.
- Gain some experience and return to extend your repertoire.
Debrief
- Demo: python module/script