Object Detection

Reading
Briefing
- Recap from last session
- Object Detection
  - The YOLO API
Exercises
Debrief - take-aways

Reading

Szeliski (2021) chapter 6, particularly Chapter 6.3
Tutorials cited below.

Briefing

Recap from last session

How far did you get with statistical estimation?

Object Detection

Classification	Detection	Segmentation

The YOLO API

Pre-trained model
Packaged data sets
Dataset API; defined by a YAML file, like this

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/coco8  # dataset root dir
train: images/train  # train images (relative to 'path') 4 images
val: images/val  # val images (relative to 'path') 4 images
test:  # test images (optional)

# Classes (80 COCO classes)
names:
  0: person
  1: bicycle
  2: car
  ...
  77: teddy bear
  78: hair drier
  79: toothbrush

Retrain the pre-trained models with custom data to tune them to particular applications
Data Augmentation
- generate additional image variants to increase the training set
- change resolution, contrast, light
- flip and rotate
- mosaics
- This can be automated, i.e. free boost of the dataset

Exercises

Several tutorials and documentation pages are cited below. I strongly recommend that you ignore them at first, focusing on attempting the tasks given. Then when you see what happens you can look up the tutorials and documentation for further ideas. Even so, there is probably more tutorial material than you have time to read, so read what inspires you and don’t read at the expense of your own tests.

Whether you complete all the tasks or not, before you call it a day, look through the debrief questions at the end, and see what you can answer.

Note that performance will be crap on a CPU. If you are not able to run YOLO on CUDA, it is recommended that you focus on skimming the tutorials and watching video demoes. You can check if CUDA is available in python by running

import torch
torch.cuda.is_available()

YOLO Quick start

See also the Official Ultralytics tutorial.

It is recommended to install torch first, and then ultralytics. This is to make sure that torch is installed correctly for your hardware and drivers.

pip install torch
pip install ultratlytics

YOLO has a CLI interface, using the yolo command directly on the command line. What we will use is the python API. See also the Official Python Guide. The one object of particular interes is the YOLO class (type):

from ultralytics import YOLO

A model can be trained and validated from scratch as follows:

model = YOLO('yolov8n.yaml')
results = model.train(data='coco128.yaml', epochs=3)
results = model.val()

This downloads the 128 images of the coco128 dataset. Interestingly, this defines both a train and a test set, but they are identical. Note the last line of output; it tells you where the performance reports are, somewhere under a runs subdirectory. Open the files to look.

What performance does the validation show? Is it satisfactory? (Look in particular at the confusion matrix, and don’t despair if the results are awful.)

It is normally recommended to use a pre-trained model, e.g.

model = YOLO('yolov8n.pt')

Starting with this pre-trained model object, rerun the training and validation steps above.

What performance do you get? Has it improved? (Hopefully, the performance is better than above.)

The model can also be applied to arbitrary images, and exported to file.

results = model('https://ultralytics.com/images/bus.jpg')
success = model.export(format='onnx')

Reflection

Look in your file systems. What files have YOLO downloaded? Start with your working directery and look for directories that you have not made yourself. You may have to look at the output to find the right location.
Inspect the files.
defining the datasets. The image files should speak for themselves. Plain text files should contain labels and bounding boxes. Look also at the run results. What information do you find?
Unfortunately .yaml file that defines the dataset is not stored locally. If you are curious, you can try to google for coco128.yaml; you may not find exactly the right file, but you will see what a data definition file looks like.

YOLO Predictions

See also predict in the documentation.

You can run your model on a single image like this.

results = model('https://ultralytics.com/images/bus.jpg')

The image is automatically downloaded over HTTP, so you can find it on your file system and inspect it. You can also give it a local file name instead. You can also pass a list of images or image files.

What kind of object is the results?

Passing one file in, there is only one item in the list you get out.

type(results)
len(results)
r = results[0]
type(r)

This should answer question 1 … now we can print the result object:

print(r)

As you can see, there are many fields, only some are interesting. For segmentation, we would have looked at maskes, and for classification at probs. Now we want to look at the boxes.

print(r.boxes)

This defines a number of bounding boxes with asssociated labels. First look at the cls field

print(r.boxes.cls)

This should give one label for each object detected. You can look up what the labels mean in the r.names list.

What kind of objects have been detected? Can you see them in the image too?

The actual bounding boxes are given in two different formats, xyxy and xywh, plus two normalised formats. You can draw the bounding boxes on the image using OpenCV. (based on example)

import cv2
img = cv2.imread('bus.jpg')
x,y,x1,y1 =  r.boxes.xyxy[0].numpy()
cv2.rectangle(img,(round(x),round(y)),(round(x1),round(y1)),  (0,0,255), 2)
cv2.imwrite('bus1.jpg', img)

Look at the result image bus1.jpg. Is this a good prediction?
(Optional) Write a function which loops through all the boxes and annotates the image. Validate all the detection results.
(Optional) Test the model on one of your own images, or something else not coming from YOLO or ultralytics

Working on your own tracker

Return to the object tracker that you made earlier. Can you get YOLO to detect the object?

(1) Simple test

Take a single frame from one of your test videas and try to clasify it using the model which you trained on COCO above.

How well does it find your object? What class does it predict?

(2) Train on custom data

Take a couple of images of your object and annotate them. Test one of these tools:

Five best image annotation tools

See also Tutorial on custom dataset including video

You need to define a dataset with your images, following the official Description of the dataset format for YOLO. You should also use the downloaded datasets as an example to show you how to do it.

When you have your dataset, try to train the network starting with the pre-trained model used above.

(3) Test again

As in the initial test, try to detect the object in a couple of frames. How well does it perform now?

(Optional) Other datasets

Test your model on a different dataset. The easiest would be to test on other datasets prepared for YOLO.

(Optional) On the detection of multiple shapes

Source : simple tutorial with keras, detecting shapes Goal : understand how machine learning models can predict multiple objects in one input image

Note that this tutorial uses keras instead of torch; kers

Clone the git repo from the tutorial git clone git@github.com:jrieke/shape-detection.git
The first notebook single-rectangle.ipynb shows how to generate the training set and train the model to detect a single rectangle.
- Go through the notebook and add explanations to the code.
The second notebook two-rectangles.ipynb shows how to generate images and train for two rectangles.
- Go through the notebook and add explanations to the code.
Another notebook multiple-rectangles.ipynb goes to three rectangles.
- Go through the notebook and add explanations to the code.

Debrief - take-aways

What is the difference between classification, detection, and segmentation?
What does the prediction output look like when you use object detection?
What could you use YOLO for in a future project?
Where would you start reading up when you need YOLO?
How would you plan an implementation project requiring object detection? What are the hard that require new knowledge? What parts will take the most time?
Would you use pre-trained models, custom data, or both? Why?