In this repository, we will process IoT data containing air quality data from a sensor. We will use the River library to detect anomalies in the data.
We explore a two-fold approach, we first process the data in a batch manner, and then we process the data in a streaming manner.
Read the complete blog here
You can find Jupyter notebooks in the notebooks
folder. You can run the notebooks in Google Colab by clicking on the links below:
If you prefer to run this locally, set up a virtual environment and install dependencies. For example with conda.
conda create -n iot-bytewax python=3.10
conda activate iot-bytewax
Then install the dependencies with pip.
pip install -r requirements.txt
The steps for batch processing are as follows:
To run the batch version, run the following command:
cd src/dataprocessing_batch
python main.py
The steps for streaming processing are as follows:
To run the streaming version, run the following command:
python -m bytewax.run dataflow:flow