Project done by Ewelina Preś, Bartosz Pokora and Franciszek Saliński, as a part of the course "Exploratory Data Analysis" at Warsaw University of Technology.
Sources:
Used technologies:
- Python 3.9.19
- pandas, numpy
- matplotlib, seaborn
- wordcloud
- transformers
- Canva
To recreate the results:
- Create Python 3.9.19 virtual environment
- Download the repository
- Head to directory
Harry-Potter-dialogue-poster/
- Run the following command:
pip install -r requirements.txt
You should be able to run all the python scripts now.
In our project, after exploring the dataset, we decided to focus on a few things and visualize them. The charts we created are:
- Boxplot that shows the distribution of the number of words spoken by characters of each gender. We included only characters that said less than 500 words, in order to get rid of outliers, which are, in our case, main characters (most of them are men, so the boxplot could be misleading)
- Lineplot that visualizes how emotions of Harry Potter change throughout the movies
- Wordclouds that show the most frequently spoken words by specific characters in an interesting, eye-catching way