This project has all the custom workloads that we will be using in Spark R&D under Prof. Umesh Bellur
mvn clean package
Class Name: "com.sapan.sparkbench.workload.wordcount.WordCount"
Parameters:
"name" = "custom"
"class" = "com.sapan.sparkbench.workload.wordcount.WordCount"
“inputdatafile” = “path/to/the/input/file”
Class Name: "com.sapan.sparkbench.workload.pagerank.PageRank"
Parameters:
"name" = "custom"
"class" = "com.sapan.sparkbench.workload.pagerank.PageRank"
“inputdatafile” = “path/to/the/input/file”
“numiterations” = “number of iterations to run in quotes”
Input should be a text file with following format:
URL neighbour URL
URL neighbour URL
…..
URL neighbour URL
The URL and neighbouring URL must be seperated by spaces
Class Name: "com.sapan.sparkbench.workload.svm.SVM”
Parameters:
"name" = "custom"
"class" = "com.sapan.sparkbench.workload.svm.SVM”
“inputdatafile” = “path/to/the/input/file”
“numiterations” = “number of iterations to run in quotes”
Input file should be a text file with data in libsvm format
Class Name: "com.sapan.sparkbench.workload.decisiontree.DecisionTreeClassification”
Parameters:
"name" = "custom"
"class" = "com.sapan.sparkbench.workload.decisiontree.DecisionTreeClassification”
“inputdatafile” = “path/to/the/input/file”
“dataformat” = “format of data (preferably libsvm)”
Some helpful resources are available in src/main/helpful-resource folder. It also contains some sample configs for some workloads.