The goal of foreground segmentation is to separate objects in the foreground from the background. Other synonyms include background subtraction and foreground detection. Segmentation is typically not the end goal, but rather a precursor to further image processing (e.g., object recognition). This repository contains a Python implementation of the median subtraction: a simple and inexpensive algorithm to compute the foreground mask (Lo and Velastin, 2001).
You'll need to install OpenCV to execute the script. I prefer to manage my dependencies with Conda, although pip is another great alternative. Once the dependencies have been installed, follow these steps:
- Amend line 8 in results.py to point to the directory containing your input frames. File names should take the form in000000.ext, in000001.ext etc.
- Ensure that lines 17 and 18 point to the directory/file containing the ground truth image. Providing you use the format gt000000.ext, gt000001.ext etc., these should be picked up automatically
- Adjust any parameters you see fit (e.g.
INPUT_FRAME
) - Execute the script by typing the command
python results.py
. This will output the results to the output directory and print the PSNR to the console
This demo uses the Scene Background Initialization (SBI) dataset which is available free of charge.
As mentioned before, we'll use median background subtraction to extract the foreground. Feel free to skip to Results if this isn't of interest. A high level overview of the pipeline is as follows:
- Perform median background estimation using
$n$ frames$$B(x,y,t) = \text{median}(I(x,y,t-i))$$ - Reduce input frame noise with Gaussian blur
$$G(x,y) = \frac{1}{2 \pi \sigma ^2} e ^{- \frac{x^2 + y^2}{2 \sigma ^2}}$$ - Subtract background from input frame
$$R(x,y,t) = | I(x,y,t) - B(x,y,t) |$$ - Perform binary thresholding
$$R(x,y,t) > Th$$
The figure below demonstrates the output of the pipeline when applied to the HighwayII image sequence:
Background estimation was performed using the first 50 frames (inclusive). As per the algorithm description, a Gaussian blur of size 3 and standard deviation of 0.485 was applied to the input frame. Finally, background subtraction was performed with a binary threshold of 32. When comparing the foreground mask to the ground truth, a PSNR of 13.8808 is generated (higher the better).
The contents of this repository are intended for educational purposes only. Use at your own peril! 🙂
Lo, B. P. L. and Velastin, S. A. (2001). Automatic congestion detection system for underground platforms. Proceedings of 2001 International Symposium on Intelligent Multimedia, Video and Speech Processing. pp. 158-161.