A comprehensive collection of lyrics and other metadata for bollywood and non-bollywod songs starting from 1940s all the way to 2000s. If you are an engineering student in India and are looking for interesting project ideas using this dataset, I have listed them on this page.
The data for this collection comes from https://www.giitaayan.com/. Giitayan's data files are stored here - https://github.com/v9y/giit. Thanks to the maintainers and contributors of Giitayan.
Download the lyrics.csv file and use it.
To play with the parser, use these steps:
- Clone the giit repo - https://github.com/v9y/giit
- Note the path to the docs/ directory from the cloned git repo. (say docs_path)
- python3 song_parser.py --input docs_path --output lyrics.csv