What I'm hoping will be a fun little side project.
Using the lyrics data of the Million Song Dataset (MSD) generously provided by musiXmatch, develop an algorithm to classify songs into appropriate genres based only on their lyrics.
The files below were downloaded from musiXmatch at the following URL: http://labrosa.ee.columbia.edu/millionsong/musixmatch
- mxm_779k_matches.txt: maps MSD ids to musiXmatch ids
- msd_genre_dataset.txt: maps MSD ids to genre labels
- mxm_dataset_train.txt: contains word count information for ~210k MSD ids
- mxm_dataset_test.txt: contains word count information for ~27k MSD ids