CIE 427 First Mini Project about Reddit dataset.
Where we make some analyses on 10% of the data using hadoop working paradigm. Please Modify the Hadoop/Java Vars in the bash files to fit your installations. To run any of the following tasks please download dataset file, take a sample, name it "test.txt" , put it in hadoop and run the bashscripts.
Most discussed/used topics associated with every subreddit and username with focus on the top subreddits
Rate of replies compared to controversiality of comment/post
Topics that yield the highest number of upvotes and/or lowest of downvotes