This is a midterm project of 622115012 - 953481 Information Retrieval course, CAMT, CMU.
- Python
- Flask
- TF
- TF-IDF
- BM25
- Dowload required lyric dataset from here, then paste into assets folder.
- Dowload required wiki datasets(100K 2020, 300K, 100K 2016, 1M 2016) from here, then paste into assets folder.
- Run main in dataProcess.py to create parsed_data.pkl and clean_wiki_100.txt.
- Run main in bm25_model.py, tf_model.py, and tfidf_model.py to create fitted mdoel and vectorizer.
- Run main in main.py, then go to http://localhost:5000/index.
- Enjoy Searching 🥣