bollywood-lyrics/project_ideas.md at main · hbdeshmukh/bollywood-lyrics · GitHub

A list of project ideas that can be done using this dataset. Some of the ideas are simple to implement where as others could require more efforts. If you are pursuing any ideas, I would be happy to help in whatever way I can.

Word cloud for lyricist

Create a word cloud for each lyrics writer in the dataset.

Analysis of multi-lingual lyrics

Are the lyrics of a song written purely in a single language?
How often are English/Punjabi/Urdu words mixed with Hindi?
Has this behaviour of mixed-language lyrics evolved over time?

Filling missing data by crowdsourcing

Note that the dataset has some missing data, e.g. the year entry for many songs is missing. Crowdsourcing can be used to fill that missing data. An example of a crowdsourcing platform is Amazon Mechanical Turk. If you pursue this project, please also propagte the changes to the original data source of Giitayan, using which I have created this dataset.

Sentiment analysis of lyrics

Can we identify the emotion expressed in the song using its lyrics?

Auto lyrics generator

Using this lyrical dataset, can we write an intelligent lyrics generator bot? Can the bot take the tone of the song as an input parameter?

Integration with multiple data sources

If we consider a song as an entity, it has multiple components. Lyrics is one, there could be others such as its video, its popularity. Can we come up with any interesting insights when we look at multiple such components (data sources)? Some examples below:

Youtube video of the song - its views, likes etc.
Google trend search for the song
Twitter hashtags for the song

Useful resources

Word frequencies for various Indian languages - IIIT Hyderabad