Image and video classification practice with OpenCV.
Frameworks: DNN, Caffe, Darknet
Libraries/Modules: cmake, numpy, OpenCV's config, and dlib
Algorithms: YOLOv3
Note: Check Reference Videos and Model config Release
- Concepts Overview
- Deep Learning Applications
- OpenCV Overview
- OpenCV's DNN
- DNN Process
- Setup/Installation
- File descriptions, commands, & outputs
- Resources
- AI vs. ML vs. DL
- Training Networks
- Hidden Layers
- Weights
- Loss Function
- Back Propogation
- Pre-trained Networks
- Image Classification
- Self-driving cars
- Handwriting transcription
- Speech recognition
- Language translation
- An open-source computer vision and machine learning software library
- Applications
- Facial recognition
- Object identification
- Human action classification
- Camera movement tracking
- Natively written in C++, can use wrappers for Python and Java
- No framework-specific limitations
- An internal representation of models - can optimize code easier
- Has its own deep learning implementation - minimum external dependencies
- Uses BGR color format (instead of RGB)
- Deep Neural Network Module
- NOT an entire deep learning framework
- Inference:
- When only a forward pass occurs (no back propgation so no default learning)
- Engine example: input -> pretrained model -> result
- Makes coding easier - no training means no GPUs needed
- OpenCV 4's DNN module supports:
- Caffe
- TensorFlow
- Darknet
- ONNX format
- Load pre-trained models from other DL frameworks
- Pre-process images using blobFromImages()
- Pass blobs through loaded pre-trained model to get output predictions (blob -> model -> inference)
- Read the Model
cv2.dnn.readNetFromCaffe(protext, caffeModel)
- loads models and weights
- Create a Four-Dimensional Blob
blob = cv2.dnn.blobFromImage(image, [scalefactor], [size], [mean], [swapRB], [crop], [ddepth])
- Input the Blob into the Network
net.setInput(blob)
- Forward pass throught the Network
outp = net.forward()
- produces an output prediction after a forward pass
- Summary of steps
- images
- blobFromImage()
- Blob
- Trained Model
- Inference
- Returns: 4D Tensor(NCHW) - # of images, # of channels, height, width
blobFromImage() Parameter | Description |
---|---|
image | Input image (1, 3, or 4 channels) |
scalefactor | Multiplier for image values |
size | Spatial size for output image |
mean | Scalar with mean values that are subtracted from BGR channels |
swapRB | Flag that swaps channels (from RGB to BGR) |
crop | Flag to crop image after resize |
ddepth | Depth of ouput blob (CV_32F or CV_8U) |
- Install Python and Anaconda
- Setup Virtual Environment
(In Anaconda Terminal)
- Create:
conda create --name ocv4 python--3.6
- Activate:
activate ocv4
- Install cmake:
pip install cmake
- Install numpy:
pip install numpy
- Install OpenCV contrib module:
pip install opencv-contrib--python=-4.0.1.24
- Install dlib:
conda install -c conda-forge dlib
orpip install dlib
- Create:
- Check if everything installed properly
- Switch to python:
python
(command line should now start with>>>
) import numpy
import cv2
- If nothing returns then it was done right :)
- Switch to python:
- Deep Learning Frameworks used
- OpenCV
- Caffe
- Darknet
Descripts
- 02_01: Displays the 'devon.jpg' image & 3 intensity/grayscale channels
- 02_02: Displays the 'devon.jpg' image
- 02_03: Runs the 'shore.mov' video
- 04_01: Returns first few entries of 'synset_words.txt' file
- 04_02: Classification & Probability in an image
- 04_03: Classification & Probability in a video
- 04_04: Classification for an image & video using YOLOv3 w/ confThreshold=0.5
- 04_05: Classification for an image & video using YOLOv3 w/ confThreshold=0.4
- command:
python image.py
- output:
- command:
python image.py
- output:
- using dnn module as an inference engine for a video file
- command:
python video.py
- output:
- passing an image through the network using YOLOv3 (an object detection algorithm) w/ confidence threshold's =0.5 and =0.4
- command:
python yolo.py --image ../images/fruit.jpg
- 04_04 output:
- passing a video through the network using YOLOv3 w/ confidence threshold = 0.4
- command:
python yolo.py --video ../images/restaurant.mov
- output: