This project involves analyzing a dataset of films using Python and creating interactive visualizations with Tableau. The main objective of the project is to gain insights into the film industry based on the available data.
The dataset used in this project contains information about various films, including their titles, MPAA ratings, budgets, gross earnings, release dates, genres, runtime, ratings, rating counts, and missing values. The dataset has been preprocessed to remove irrelevant data and handle missing values.
The following preprocessing steps were performed on the dataset:
- Irrelevant data was removed to focus on the relevant information.
- Standardization techniques were applied to ensure consistency and uniformity in the data.
- Rows with more than three missing values were eliminated to maintain data integrity.
- NaN cells and "0" values were assigned appropriate values to ensure accurate analysis. For columns with null values, the mean of the respective column was calculated and assigned to the null cells.
The dataset was initially processed using Jupyter Notebook, where various analysis techniques and Python libraries were utilized to gain insights into the film data. Afterwards, Tableau was employed to create dynamic visualizations, including dashboards, allowing for interactive exploration of the dataset.
To get started with this project, you can follow these steps:
- Clone the repository to your local machine (git clone https://github.com/MaxLopezSalgado/films_analysis.git).
- Set up the required dependencies (mention the specific dependencies and versions if applicable).
- Open the Jupyter Notebook file and run the code to perform the data analysis.
- Explore the Tableau visualizations by opening the provided Tableau project file.
Specify the dependencies required to run the project. For example:
- Python (version 3.10.2)
- Jupyter Notebook
- Public Tableau Link of the Project.
- Python libraries: [Pandas]