---
title: Object Detection
categories: session
---

# Reading

+ Szeliski (2021) chapter 6, particularly Chapter 6.3
+ Tutorials cited below.

# Briefing

## Recap from last session

1.  How far did you get with statistical estimation?

See [Statistics]().

## Object Detection

| Classification | Detection | Segmentation |
| :-             | :-        | :-           |
| ![cat 0.92](https://res.cloudinary.com/practicaldev/image/fetch/s--TVyKMU1e--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/l7qo0izq963numv9jxue.jpg) | ![bounding box](https://res.cloudinary.com/practicaldev/image/fetch/s--qE-I2tss--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/02c7l1o7haf7wv9f17qr.jpg) | ![pixels](https://res.cloudinary.com/practicaldev/image/fetch/s--9EVSzsw4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/y09uxmlrcgvfj8a87wol.jpg) |

### The YOLO API

+ Pre-trained model
+ Packaged data sets
+ Dataset API; defined by a YAML file, like this

```
# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/coco8  # dataset root dir
train: images/train  # train images (relative to 'path') 4 images
val: images/val  # val images (relative to 'path') 4 images
test:  # test images (optional)

# Classes (80 COCO classes)
names:
  0: person
  1: bicycle
  2: car
  ...
  77: teddy bear
  78: hair drier
  79: toothbrush
```

+ Retrain the pre-trained models with custom data to tune them
  to particular applications
+ [Data Augmentation](https://medium.com/red-buffer/apply-data-augmentation-on-yolov5-yolov8-dataset-958e89d4bc5d)
    + generate additional image variants to increase the training set
    + change resolution, contrast, light
    + flip and rotate
    + mosaics
    + This can be automated, i.e. free boost of the dataset
 
# Exercises

Several tutorials and documentation pages are cited below.
I **strongly** recommend that you ignore them at first, focusing
on attempting the tasks given.  Then when you **see what happens**
you can look up the tutorials and documentation for further ideas.
Even so, there is probably more tutorial material than you have time
to read, so read what inspires you and don't read at the expense of 
your own tests.

Whether you complete all the tasks or not, **before** you call it a day,
look through the debrief questions at the end, and see what you can answer.

Note that performance will be crap on a CPU.
If you are not able to run YOLO on CUDA, it is recommended that you 
focus on skimming the tutorials and 
watching video demoes.
You can check if CUDA is available in python by running
```
import torch
torch.cuda.is_available()
```

## YOLO Quick start

See also the [Official Ultralytics tutorial](https://docs.ultralytics.com/quickstart/#install-ultralytics).


It is recommended to install torch first, and then ultralytics.
This is to make sure that `torch` is installed correctly for your 
hardware and drivers.
```
pip install torch
pip install ultratlytics
```

YOLO has a CLI interface, using the `yolo` command directly on the command line.
What we will use is the python API.
See also the 
[Official Python Guide](https://docs.ultralytics.com/usage/python/).
The one object of particular interes is the `YOLO` class (type):
```python
from ultralytics import YOLO
```

A model can be trained and validated from scratch as follows:
```python
model = YOLO('yolov8n.yaml')
results = model.train(data='coco128.yaml', epochs=3)
results = model.val()
```
This downloads the 128 images of the coco128 dataset.
Interestingly, this defines both a train and a test set, but they
are identical.
Note the last line of output; it tells you where the performance 
reports are, somewhere under a `runs` subdirectory.  Open the files
to look.

1.  *What performance does the validation show?  Is it satisfactory?*
    (Look in particular at the confusion matrix, and don't despair if
    the results are awful.)

It is normally recommended to use a pre-trained model, e.g. 
```python
model = YOLO('yolov8n.pt')
```
Starting with this pre-trained `model` object, rerun the training
and validation steps above.

2.  *What performance do you get?  Has it improved?*
    (Hopefully, the performance is better than above.)

The model can also be applied to arbitrary images, and exported to file.
```python
results = model('https://ultralytics.com/images/bus.jpg')
success = model.export(format='onnx')
```

### Reflection

3.  Look in your file systems.  *What files have YOLO downloaded?*
    Start with your working directery and look for directories
    that you have not made yourself.
    You may have to look at the output to find the right location.
4.  Inspect the files.  
    defining the datasets.  The image files should speak for themselves.
    Plain text files should contain labels and bounding boxes.
    Look also at the run results.
    *What information do you find?*
5.  Unfortunately `.yaml` file that defines the dataset is not stored locally.
    If you are curious, you can try to google for `coco128.yaml`; you may not find
    exactly the right file, but you will see what a data definition file looks like.
  

## YOLO Predictions

See also [predict in the documentation](https://docs.ultralytics.com/modes/predict/#boxes).

You can run your model on a single image like this.
```python
results = model('https://ultralytics.com/images/bus.jpg')
```
The image is automatically downloaded over HTTP, so you can
find it on your file system and inspect it.  You can also
give it a local file name instead.  You can also pass a list
of images or image files.

1.  *What kind of object is the `results`?*

Passing one file in, there is only one item in the list you get out.

```
type(results)
len(results)
r = results[0]
type(r)
```
This should answer question 1 ...
now we can print the result object:
```
print(r)
```
As you can see, there are many fields, only some are interesting.
For segmentation, we would have looked at `maskes`, and
for classification at `probs`.  Now we want to look at the boxes.

```
print(r.boxes)
```
This defines a number of bounding boxes with asssociated labels.
First look at the `cls` field
```
print(r.boxes.cls)
```
This should give one label for each object detected.
You can look up what the labels mean in the `r.names` list.

2. *What kind of objects have been detected?  Can you see them in the image too?*

The actual bounding boxes are given in two different formats, xyxy and xywh, plus
two normalised formats.
You can draw the bounding boxes on the image using OpenCV.
(based on [example](https://stackoverflow.com/questions/63923800/drawing-bounding-rectangles-around-multiple-objects-in-binary-image-in-python))

```
import cv2
img = cv2.imread('bus.jpg')
x,y,x1,y1 =  r.boxes.xyxy[0].numpy()
cv2.rectangle(img,(round(x),round(y)),(round(x1),round(y1)),  (0,0,255), 2)
cv2.imwrite('bus1.jpg', img)
```

3.  *Look at the result image `bus1.jpg`.  Is this a good prediction?*

4. (Optional)
   Write a function which loops through all the boxes and annotates the image.
   Validate all the detection results.
5. (Optional)
   Test the model on one of your own images, or something else not coming from YOLO or
   ultralytics

## Working on your own tracker

Return to the object tracker that you made earlier.
Can you get YOLO to detect the object?

### (1)  Simple test

Take a single frame from one of your test videas and try to clasify it
using the model which you trained on COCO above.

+ *How well does it find your object?  What class does it predict?*

### (2)  Train on custom data

Take a couple of images of your object and annotate them.
Test one of these tools:

+ [Five best image annotation tools](https://blog.roboflow.com/best-image-annotation-tools/)

See also [Tutorial on custom dataset](https://towardsdatascience.com/how-to-train-a-custom-object-detection-model-with-yolo-v5-917e9ce13208) including video

You need to define a dataset with your images, following the official
[Description of the dataset format](https://docs.ultralytics.com/datasets/detect/)
for YOLO.
You should also use the downloaded datasets as an example to show you how to do it.

When you have your dataset, try to train the network starting with the pre-trained
model used above. 

### (3)  Test again

As in the initial test, try to detect the object in a couple of
frames.  How well does it perform now?


## (Optional) Other datasets

Test your model on a different dataset.
The easiest would be to test on other 
[datasets prepared for YOLO](https://docs.ultralytics.com/datasets/detect/#usage).

## (Optional) On the detection of multiple shapes

Source : [simple tutorial](https://towardsdatascience.com/object-detection-with-neural-networks-a4e2c46b4491) with keras, detecting shapes
Goal : understand how machine learning models can predict multiple objects in one input image  

Note that this tutorial uses keras instead of torch; kers 

1.  Clone the git repo from the tutorial `git clone git@github.com:jrieke/shape-detection.git`
2.  The first notebook `single-rectangle.ipynb` shows how to generate the training set
    and train the model to detect a single rectangle. 
    + Go through the notebook and add explanations to the code.
3.  The second notebook `two-rectangles.ipynb` shows how to generate images and train for
    two rectangles. 
    + Go through the notebook and add explanations to the code.
2.  Another notebook `multiple-rectangles.ipynb` goes to three rectangles. 
    + Go through the notebook and add explanations to the code.

# Debrief - take-aways

1. What is the difference between classification, detection, and segmentation?
2. What does the prediction output look like when you use object detection?
3. What could you use YOLO for in a future project?
4. Where would you start reading up when you need YOLO?
5. How would you plan an implementation project requiring object detection?
   What are the hard that require new knowledge?
   What parts will take the most time?
6. Would you use pre-trained models, custom data, or both? Why?