Revision 444e5a3daf141740f8beedb68560a6eaf9c0fd3f (click the page title to view the current version)

Neural Networks

Changes from 444e5a3daf141740f8beedb68560a6eaf9c0fd3f to c82c5ed524e5b1617a3036ab8a76ac16e6e85a45

---
title: Neural Networks
categories: session
---

**Reading**
The material in this page is meant for 3-4 sessions.
You should be content if you do one exercise in a day; some may
be able to complete two in a day.

+ [Briefing 10 November](ANN)

# Reading

+ [PyTorch Quickstart](https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html)
    + the quickstart tutorial is part of [Learn the Basics](https://pytorch.org/tutorials/beginner/basics/intro.html)
+ [Learn the Basics](https://pytorch.org/tutorials/beginner/basics/intro.html)
    + see also the tutorial under Exercise 1. below
+ Szeliski 2022 Chapter 5

[Briefing](ANN)

# Exercise 1. Basic tutorial.

I have added a couple of exercises to the official
[PyTorch Quickstart](https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html).

+ Download and open the (augmented) [tutorial](ann-tutorial.ipynb).
+ Please reflect upon and discuss the questions.

# Exercise 2. Convolutional Neural Networks

1. Complete the [CIFAR-10 Tutorial](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html)
2. Compare the approach to Exercise 1.  What is different?
   What is the same?
   
# Exercise 3. Regression.

One in-house project is to classify images of remote galaxies,
which are often distorted due to the gravity of dark matter.
This effect is known as gravitational lensing.

A sample dataset can be found at
[github](https://github.com/CosmoAI-AES/datasets2022/Exercise2022).
This directory contains

+ A data file, `sphere-pm.csv` 
+ 10000 images of distorted galaxies
+ A python file `Dataset.py` defining a subclass of `Dataset`
  to manage these data

The CSV file has the form
```
index,filename,source,lens,chi,x,y,einsteinR,sigma,sigma2,theta,nterms
"00001",image-00001.png,s,p,50,30,40,19,31,0,0,16
```
The interesting columns are the filename which points to the
input image,  and
the four output variables $x$, $y$, `einsteinR`, and $\sigma$.
The other columns are associated with more advanced problem instances
and should be ignored.

1.  Study the Dataset class `Dataset.py`.  How is the dataset managed?
2.  Use this class to test if you can train a network to determine the
    four outputs for an image.

# Exercise 4.  Managing Data.
# Exercise 4.  Own Data (Optional)

Doing a tutorial is good, but it is little use if you 
you can only use the sample data.
The goal of this exercise is to learn to manage other
datasets.
Making your own training data may be difficult, because of the
large quantity of data that you need, but if you want to give it
a try, I would recommend this. 

Please be aware that deep learning is usually extremely compute
intensive, and you can easily run into a problem which takes days
to compute.  The immediate solution to this is to
1.  Team up with everybody else who wants to do this problem.
2.  Each team member makes a set of hand-drawn digits, at least
    ten versions of each digit each.
3.  Digitise the digits into image files.  Make sure that everybody
    uses the same resolution.  
    + The resolution must be large enough to make digits recognisable,
      but small enough to save computational power.
    + Somewhere between $28\times28$ (same as the MIST dataset)
      and $64\times64$ may be considered.
4.  Gather the images into a dataset with their labels (digit value).
    You may use Exercise 3 as an example.
5.  Merge and share the dataset.
6.  Train and test a neural network to read the digits.
7.  Try different networks, at least the ones you have used in previous
    exercises.

Download another dataset, for instance from 
# Exercise 5.  More examples (optional)

Other datases may be found at 
[this collection](https://paperswithcode.com/datasets?task=image-classification) 
and try to solve the classification problem.
if you want to try other variants.