Deploy data pipeline system

First turn up docker compose using

docker compose up -d

You can turn off docker compose when done

docker compose down -v

Stream content to topic

Then stream content to kafka from your computer

python Kafka/kafka_producer.py

Process Data

When you want to process data from HDFS using Spark, run

docker exec -it spark-master sh -c "spark-submit --master spark://spark-master:7077 /opt/bitnami/spark/process_data.py"

Save to local computer

When you want to download processed data from HDFS to local computer at /Hadoop, run

docker exec -it namenode sh -c "rm -rf /Hadoop/output_zone && hdfs dfs -copyToLocal /output_zone /Hadoop"

Some useful command

python Kafka/kafka_consumer.py
docker exec -it namenode sh -c "hdfs dfs -rm -r /output/*"
docker exec -it namenode sh -c "hdfs dfs -rm -r /raw_zone/fact/activity/*"

GUI location

Nifi: http://localhost:8080/nifi Hadoop: http://localhost:9870 Spark: http://localhost:9090

Keeping the code End Of Line Sequence LF to be consistent with Linux VM

git config --global core.autocrlf input
git config --global core.eol lf
git add --renormalize .
git commit -m "Normalize line endings to LF"
git push origin master

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deploy data pipeline system

Stream content to topic

Process Data

Save to local computer

Some useful command

GUI location

Keeping the code End Of Line Sequence LF to be consistent with Linux VM

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Data		Data
Hadoop		Hadoop
Kafka		Kafka
Nifi		Nifi
Spark		Spark
README.md		README.md
docker-compose.yml		docker-compose.yml

Crusty-Banana/Streaming-Data-Pipeline

Folders and files

Latest commit

History

Repository files navigation

Deploy data pipeline system

Stream content to topic

Process Data

Save to local computer

Some useful command

GUI location

Keeping the code End Of Line Sequence LF to be consistent with Linux VM

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages