Revision 214c7628b543426ded61d7c71d6ca17be1500b90 (click the page title to view the current version)

Statistics

Changes from 214c7628b543426ded61d7c71d6ca17be1500b90 to 48e26ea7efc36998dc119da5acc2a47264b18232

---
title: Evaluation and Statistics
---

**Reading**
[Szeliski (2022): Computer Vision: Algorithms and Applications, 2nd ed.](https://szeliski.org/Book/)
Chapter 5.1-5.2 

# Exercises

## Exercise 1

Review the machine learning systems you studied last week.
If you have not already done it, make a plot which shows
the total loss as a function of the number of epochs.

For instance, you can start with empty lists, and for each
epoch you train once and test once, recording the total loss,
like this.

```python
trainloss = [ ]
trainloss2 = [ ]
testloss = [ ]
    
for epoch in range(12):  # loop over the dataset multiple times

    tloss = trainmodel(net,trainloader)
    trainloss.append(tloss)

    trainloss2.append(evalmodel(net,trainloader))
    testloss.append(evalmodel(net,testloader))

x = list(range(len(testloss)))
plt.plot( x, trainloss, "b", x, trainloss2, "k", x, testloss, "r" )
plt.savefig( "plot.svg" )
```

The functions `trainmodel()` and `evalmodel()` are defined in
[NetForStatistics.py](Python/NetForStatistics.py).
Obviously, you need to initialise the network and the datasets
before you run the code above.  Again using functions from
`NetForStatistics`, like this

```python
(trainloader,testloader) = getDataset()
net = Net()
```

You may want to tweak the device 
(you can use `net = Net("cpu")` with the given code)
and the number of epochs.
You may also use the code differently and show the plot interactively
instead of writing the file `plot.svg`.

### Questions for reflection

1.  Which curve is which in the plot (`plot.svg`)?
    (Read the code to know.)
2.  How does the loss behave differently on the training and test sets?
3.  What is the difference between the two test set curves,
    `testloss` and `testloss2`?


## Exercise 2

Review the machine learning systems you studied last week.
Calculate the confusion matrix for each of the systems.

+ Are the errors reasonably balanced?
+ Are any classes particularly difficult to detect?

## Exercise 2

Reviewing again each of the systems from last week.

1. Calculate the false positive and false negative rates.
2. Estimate the standard deviation of the error rates.
3. Assess the quality of the machine learning system.
   Can you be confident that the error probability is satisfactory?
4. How large does the test set have to be to make a confident assessment? 

See also exercises on [Regression]() intended for tomorrow. 

# Briefing

## Recap

*What did we learn last week?*

1. Supervised learning
2. Loss function -- Cross-Entropy
    + $E(\mathbf{w}) = -\sum_n p_{n,t_n}$
    + $p_{n,k}$ is the networks estimated probability that
      object $n$ has the class $k$
    + $t_n$ is the true class of $t_n$

## Regression problem - Gravitational Lensing

1.  Lensing model 
    + Source - size $\sigma$ and position $(x,y)$
    + Lens  - Einstein radius $R_E$
    + Distorted Image
2.  Four parameters determine the distorted image.
    + Can we recover these four parameters from the image?
3.  Instead of a discrete class, we want the network to predict
    $\sigma,x,y,R_E$
4.  Loss function is Mean Squared Error
    + $\mathsf{MSE} =
       \frac14(\sigma-\hat\sigma)^2+(x-\hat x)^2
              +(y - \hat y)^2+(R_E-\hat R_E)^2$
    + Note, normalisation can vary.
        + The starting point is the sum of squared errors (SSE).
        + We may or may not normalise by dividing by the number
          of data points and or prediction parameters. 
    

## Evaluation

+ Confusion Matrix
    + false positive, false negative
+ Accuracy

$$ \mathsf{Acc} = \frac{ \mathsf{TP} + \mathsf{TN} }{
            \mathsf{TP}+\mathsf{FN} + \mathsf{TN}+\mathsf{FP}
         }$$
+ Warning.  Biased datasets
+ Other heuristics

$$ F_1 = \frac{ \frac{\mathsf{TP}}{\mathsf{TP}+\mathsf{FP}} 
          \cdot \frac{\mathsf{TP}}{\mathsf{TP}+\mathsf{FN}} 
         }{ \frac{\mathsf{TP}}{\mathsf{TP}+\mathsf{FP}} 
            + \frac{\mathsf{TP}}{\mathsf{TP}+\mathsf{FN}} 
         }$$

+ TP/TN/FP/FN are Stochastic Variables
    + Binomially Distributed
+ Regression: absolute or squared error
    + Also a Stochastic Variable 
      (depends on the data drawn randomly from the population)
    + Mean over a large dataset gives a reasonable estimator
    + Standard Deviation can be estimated using the sample
      standard deviation
+ Important: Each item in the test set makes one experiment/observation.
  This allows statistical 

## Overtraining and Undertraining

+ Exercise last week.
    + I have not been able to generate the expected result.
    + The deep networks tested produce impressive results with 
      very little training.
+ Still, important principle.
+ The training data
    + contain a limited amount of information about the population
    + have some peculiar quirks
+ The network has a certain number of DOF (weights) which can
  be adjusted to store information extracted from the training set.
+ Undertraining means insufficient training to absorb 
  the relevant information
    + insufficient epochs
    + insufficient training set (relative to DOF)
+ A large network with many DOF, can learn a small dataset completely.
    + Overtraining means that the network has learnt more than
      what generalises
+ Sometimes, regularisation techniques are used to remove 
  (zero out) DOF with little impact
    + Occam's Razor

## Normalisation

+ Network layers add up coefficients from different sources
    + Large numbers contribute a lot.
    + Numbers with a small range contribute little.
+ Images are simple.  One range for all pixels.
+ Some datasets combine data with different ranges.
    + If some have range $\pm1$, some have range
      $(0,10^{-5})$ and others $(0,10^5)$,
      then small numbers are negligible and are effectively
      ignored.
+ Scaling is standard procedure.
    + Scale each column training data to $(0,1)$ (or $\pm1$).
    + Store the scaling function and apply it to the test set
      and all future data when making predictions.
+ This also applies to weights in the network.
    + Weights should be balanced between layers.
    + Batch Normalisation.
+ Too many normalisation and regularisation techniques to learn
  all before starting. 
    + New techniques keep emerging.
    + Gain some experience and return to extend your repertoire.

# Debrief

+ Demo: [python module/script](Python/cnn.py)