Distributed Data Mining Practical Course

This repository contains applications developed during Distributed Data Mining Practical Course at Technical University of Munich. During the course, I have written several programs for Big Data processing and Data Mining using different languages and frameworks. Additionally, I have been using Terraform and Ansible to automate setting up the environment that is required to run these programs.

Technologies

During a Distributed Data Mining, I was using following technologies:

Apache Spark SQL - Spark library for data processing
Apache Spark MLlib - Spark library for machine learning
Hadoop Distributed File System (HDFS) - distributed file system for storing large files
Hadoop YARN - resource manager responsible for running and scheduling applications in cluster
Hadoop MapReduce - framework enabling processing large amount of data using MapReduce
Dask - Python library for parallelizing processing using multithreading and distributed processing
Terraform - tool for automating infrastructure provisioning using Infrastructure as Code
Ansible - tool for automating nodes/VMs configuration

During the course, I used the following languages:

Scala - Spark SQL, Spark MLlib
Java - Hadoop MapReduce, Spark SQL
Python - PySpark, Dask

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Dask		Dask
HDFS		HDFS
PySpark		PySpark
Spark		Spark
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed Data Mining Practical Course

Technologies

About

Releases

Packages

Languages

Szczepaniak-M/Distributed-Data-Mining

Folders and files

Latest commit

History

Repository files navigation

Distributed Data Mining Practical Course

Technologies

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages