--- title: Object Detection categories: session --- # Reading + Szeliski (2021) chapter 6, particularly Chapter 6.3 + Tutorials cited below. # Briefing ## Recap from last session 1. How far did you get with statistical estimation? See [Statistics](). ## Object Detection | Classification | Detection | Segmentation | | :- | :- | :- | | ![cat 0.92](https://res.cloudinary.com/practicaldev/image/fetch/s--TVyKMU1e--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/l7qo0izq963numv9jxue.jpg) | ![bounding box](https://res.cloudinary.com/practicaldev/image/fetch/s--qE-I2tss--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/02c7l1o7haf7wv9f17qr.jpg) | ![pixels](https://res.cloudinary.com/practicaldev/image/fetch/s--9EVSzsw4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/y09uxmlrcgvfj8a87wol.jpg) | ### The YOLO API + Pre-trained model + Packaged data sets + Dataset API; defined by a YAML file, like this ``` # Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..] path: ../datasets/coco8 # dataset root dir train: images/train # train images (relative to 'path') 4 images val: images/val # val images (relative to 'path') 4 images test: # test images (optional) # Classes (80 COCO classes) names: 0: person 1: bicycle 2: car ... 77: teddy bear 78: hair drier 79: toothbrush ``` + Retrain the pre-trained models with custom data to tune them to particular applications + [Data Augmentation](https://medium.com/red-buffer/apply-data-augmentation-on-yolov5-yolov8-dataset-958e89d4bc5d) + generate additional image variants to increase the training set + change resolution, contrast, light + flip and rotate + mosaics + This can be automated, i.e. free boost of the dataset # Exercises Several tutorials and documentation pages are cited below. I **strongly** recommend that you ignore them at first, focusing on attempting the tasks given. Then when you **see what happens** you can look up the tutorials and documentation for further ideas. Even so, there is probably more tutorial material than you have time to read, so read what inspires you and don't read at the expense of your own tests. Whether you complete all the tasks or not, **before** you call it a day, look through the debrief questions at the end, and see what you can answer. Note that performance will be crap on a CPU. If you are not able to run YOLO on CUDA, it is recommended that you focus on skimming the tutorials and watching video demoes. You can check if CUDA is available in python by running ``` import torch torch.cuda.is_available() ``` ## YOLO Quick start See also the [Official Ultralytics tutorial](https://docs.ultralytics.com/quickstart/#install-ultralytics). It is recommended to install torch first, and then ultralytics. This is to make sure that `torch` is installed correctly for your hardware and drivers. ``` pip install torch pip install ultratlytics ``` YOLO has a CLI interface, using the `yolo` command directly on the command line. What we will use is the python API. See also the [Official Python Guide](https://docs.ultralytics.com/usage/python/). The one object of particular interes is the `YOLO` class (type): ```python from ultralytics import YOLO ``` A model can be trained and validated from scratch as follows: ```python model = YOLO('yolov8n.yaml') results = model.train(data='coco128.yaml', epochs=3) results = model.val() ``` This downloads the 128 images of the coco128 dataset. Interestingly, this defines both a train and a test set, but they are identical. Note the last line of output; it tells you where the performance reports are, somewhere under a `runs` subdirectory. Open the files to look. 1. *What performance does the validation show? Is it satisfactory?* (Look in particular at the confusion matrix, and don't despair if the results are awful.) It is normally recommended to use a pre-trained model, e.g. ```python model = YOLO('yolov8n.pt') ``` Starting with this pre-trained `model` object, rerun the training and validation steps above. 2. *What performance do you get? Has it improved?* (Hopefully, the performance is better than above.) The model can also be applied to arbitrary images, and exported to file. ```python results = model('https://ultralytics.com/images/bus.jpg') success = model.export(format='onnx') ``` ### Reflection 3. Look in your file systems. *What files have YOLO downloaded?* Start with your working directery and look for directories that you have not made yourself. You may have to look at the output to find the right location. 4. Inspect the files. defining the datasets. The image files should speak for themselves. Plain text files should contain labels and bounding boxes. Look also at the run results. *What information do you find?* 5. Unfortunately `.yaml` file that defines the dataset is not stored locally. If you are curious, you can try to google for `coco128.yaml`; you may not find exactly the right file, but you will see what a data definition file looks like. ## YOLO Predictions See also [predict in the documentation](https://docs.ultralytics.com/modes/predict/#boxes). You can run your model on a single image like this. ```python results = model('https://ultralytics.com/images/bus.jpg') ``` The image is automatically downloaded over HTTP, so you can find it on your file system and inspect it. You can also give it a local file name instead. You can also pass a list of images or image files. 1. *What kind of object is the `results`?* Passing one file in, there is only one item in the list you get out. ``` type(results) len(results) r = results[0] type(r) ``` This should answer question 1 ... now we can print the result object: ``` print(r) ``` As you can see, there are many fields, only some are interesting. For segmentation, we would have looked at `maskes`, and for classification at `probs`. Now we want to look at the boxes. ``` print(r.boxes) ``` This defines a number of bounding boxes with asssociated labels. First look at the `cls` field ``` print(r.boxes.cls) ``` This should give one label for each object detected. You can look up what the labels mean in the `r.names` list. 2. *What kind of objects have been detected? Can you see them in the image too?* The actual bounding boxes are given in two different formats, xyxy and xywh, plus two normalised formats. You can draw the bounding boxes on the image using OpenCV. (based on [example](https://stackoverflow.com/questions/63923800/drawing-bounding-rectangles-around-multiple-objects-in-binary-image-in-python)) ``` import cv2 img = cv2.imread('bus.jpg') x,y,x1,y1 = r.boxes.xyxy[0].numpy() cv2.rectangle(img,(round(x),round(y)),(round(x1),round(y1)), (0,0,255), 2) cv2.imwrite('bus1.jpg', img) ``` 3. *Look at the result image `bus1.jpg`. Is this a good prediction?* 4. (Optional) Write a function which loops through all the boxes and annotates the image. Validate all the detection results. 5. (Optional) Test the model on one of your own images, or something else not coming from YOLO or ultralytics ## Working on your own tracker Return to the object tracker that you made earlier. Can you get YOLO to detect the object? ### (1) Simple test Take a single frame from one of your test videas and try to clasify it using the model which you trained on COCO above. + *How well does it find your object? What class does it predict?* ### (2) Train on custom data Take a couple of images of your object and annotate them. Test one of these tools: + [Five best image annotation tools](https://blog.roboflow.com/best-image-annotation-tools/) See also [Tutorial on custom dataset](https://towardsdatascience.com/how-to-train-a-custom-object-detection-model-with-yolo-v5-917e9ce13208) including video You need to define a dataset with your images, following the official [Description of the dataset format](https://docs.ultralytics.com/datasets/detect/) for YOLO. You should also use the downloaded datasets as an example to show you how to do it. When you have your dataset, try to train the network starting with the pre-trained model used above. ### (3) Test again As in the initial test, try to detect the object in a couple of frames. How well does it perform now? ## (Optional) Other datasets Test your model on a different dataset. The easiest would be to test on other [datasets prepared for YOLO](https://docs.ultralytics.com/datasets/detect/#usage). ## (Optional) On the detection of multiple shapes Source : [simple tutorial](https://towardsdatascience.com/object-detection-with-neural-networks-a4e2c46b4491) with keras, detecting shapes Goal : understand how machine learning models can predict multiple objects in one input image Note that this tutorial uses keras instead of torch; kers 1. Clone the git repo from the tutorial `git clone git@github.com:jrieke/shape-detection.git` 2. The first notebook `single-rectangle.ipynb` shows how to generate the training set and train the model to detect a single rectangle. + Go through the notebook and add explanations to the code. 3. The second notebook `two-rectangles.ipynb` shows how to generate images and train for two rectangles. + Go through the notebook and add explanations to the code. 2. Another notebook `multiple-rectangles.ipynb` goes to three rectangles. + Go through the notebook and add explanations to the code. # Debrief - take-aways 1. What is the difference between classification, detection, and segmentation? 2. What does the prediction output look like when you use object detection? 3. What could you use YOLO for in a future project? 4. Where would you start reading up when you need YOLO? 5. How would you plan an implementation project requiring object detection? What are the hard that require new knowledge? What parts will take the most time? 6. Would you use pre-trained models, custom data, or both? Why?