A list of project ideas that can be done using this dataset. Some of the ideas are simple to implement where as others could require more efforts. If you are pursuing any ideas, I would be happy to help in whatever way I can.
Create a word cloud for each lyrics writer in the dataset.
- Are the lyrics of a song written purely in a single language?
- How often are English/Punjabi/Urdu words mixed with Hindi?
- Has this behaviour of mixed-language lyrics evolved over time?
Note that the dataset has some missing data, e.g. the year entry for many songs is missing. Crowdsourcing can be used to fill that missing data. An example of a crowdsourcing platform is Amazon Mechanical Turk. If you pursue this project, please also propagte the changes to the original data source of Giitayan, using which I have created this dataset.
Can we identify the emotion expressed in the song using its lyrics?
Using this lyrical dataset, can we write an intelligent lyrics generator bot? Can the bot take the tone of the song as an input parameter?
If we consider a song as an entity, it has multiple components. Lyrics is one, there could be others such as its video, its popularity. Can we come up with any interesting insights when we look at multiple such components (data sources)? Some examples below:
- Youtube video of the song - its views, likes etc.
- Google trend search for the song
- Twitter hashtags for the song
- Word frequencies for various Indian languages - IIIT Hyderabad