Skip to content

ZakiChammaa/bigdata-project

Repository files navigation

bigdata-project

Prepare Environment

  • Clone project:

    git clone [email protected]:ZakiChammaa/bigdata-project.git
  • Create a virtual environment:

    virtualenv -p python3 virtualenv
  • Install the required libraries:

    pip install -r requirements.txt

How to run

  • Build machine learning model:

    You can build the decision tree or the random forest model. Note that you have to delete the model/ directory everytime you want to build a new one.

    To build the decision tree model:

    python ml/decision_trees.py

    To build the random forest model:

    python ml/random_forest.py
  • Stream the data and evaluate:

    To stream the data, open 2 terminals.

    On the first one, run the following:

    python server.py localhost 9999

    On the second terminal, run the following:

    ./virtualenv/bin/spark-submit streaming.py localhost 9999

    When the data is done streaming, kill the program and run the following to get the accuracy:

    python test_accuracy.py

Note that the data is already cleaned up and is available in the data folder. If you want to run the preprocessing script, run the following:

python data_preprocessing.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages