Skip to content

Collection of programs developed during Distributed Data Mining Course at TUM

Notifications You must be signed in to change notification settings

Szczepaniak-M/Distributed-Data-Mining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Distributed Data Mining Practical Course

This repository contains applications developed during Distributed Data Mining Practical Course at Technical University of Munich. During the course, I have written several programs for Big Data processing and Data Mining using different languages and frameworks. Additionally, I have been using Terraform and Ansible to automate setting up the environment that is required to run these programs.

Technologies

During a Distributed Data Mining, I was using following technologies:

  • Apache Spark SQL - Spark library for data processing
  • Apache Spark MLlib - Spark library for machine learning
  • Hadoop Distributed File System (HDFS) - distributed file system for storing large files
  • Hadoop YARN - resource manager responsible for running and scheduling applications in cluster
  • Hadoop MapReduce - framework enabling processing large amount of data using MapReduce
  • Dask - Python library for parallelizing processing using multithreading and distributed processing
  • Terraform - tool for automating infrastructure provisioning using Infrastructure as Code
  • Ansible - tool for automating nodes/VMs configuration

During the course, I used the following languages:

  • Scala - Spark SQL, Spark MLlib
  • Java - Hadoop MapReduce, Spark SQL
  • Python - PySpark, Dask

About

Collection of programs developed during Distributed Data Mining Course at TUM

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published