Google Play Store EDA

Course Name: Data Science Lab (R4EC3012P)

Date: January-May 2023

Google Play Store EDA

Exploratory data analysis (EDA) is used by data scientists to analyse and investigate data sets for patterns, and anomalies (outliers), and form hypotheses based on our understanding of the dataset and summarize their main characteristics, often employing data visualization methods.

About The Project

Aim

We perform Exploratory Data Analysis (EDA) on the Google Play Store data and produce some results and outcomes.

Description

The objective of this experiment is to deliver insights to understand customer demands better and thus help application developers to popularize their products. In this project we examine the different attributes present in the data set that affect the popularity of the application. We focused on to answer the questions like,

Which category has the greatest number of installations?
How many free apps does the Play Store have?
Which is the most common category of apps on Play Store?
Which is the most expensive category?
Which category has the highest number of reviews on Play Store?

Getting Started

Prerequisites

Should have python environment. You can refer here for the setup.
Python librairies
- NumPy pip install numpy
- Seaborn pip install seaborn
- Pandas pip install pandas
- Matplotlib pip install matplotlib

For installation of pip you can refer here

Must have Test Data, or you can get it from here and from kaggle.

Installation

Clone the repo

git clone https://github.com/Yash-Desh/Google-Playstore-EDA.git

Theory and Approach

Data Cleaning

Our data set contains a large number of null values in the rating column, so we drop them. Some of the columns have a smaller number of null values, so we replace the null values in these columns with the mode value of that particular column. Our data set also contains duplicate rows for a single application. We also drop the duplicate rows because the rows contain the identical data. Also drop the rows, which have rating greater than 5.

Checking how many outliers are present and removing them

Outlier is a data object that deviates significantly from the rest of the data objects and behaves in a different manner. They can be caused by measurement or execution errors. The analysis of outlier data is referred to as outlier analysis or outlier mining. We find Point Outlier in our dataset by giving the condition of Ratings greater than 5

df[df.Ratings>5]

Box plot and Histogram when outliers present:

Removing Outliers

df.drop([10472], inplace=True)
df[10470:10475]

Box plot and Histogram when outliers removed:

Data visualization

Charts and graphs helped us uncover valuable insights from this complex data, revealing patterns and connections within user engagement metrics like app installations, ratings, and reviews across different categories. This approach highlighted popular app genres and showed how various factors impact app performance. Visualizing market trends and distribution of key variables guided decisions on app development, marketing, and user experience. Ultimately, these visuals provided a compelling way to communicate our findings, supporting the conclusions drawn from the analysis.

Results and Outcomes

The following graphs depict the results of the visualization:

Category VS Install:

Category VS Pricing:

Category VS Reviews:

Most of the apps are free so developers should focus on creating free apps to have a huge customer base. More Apps should be in the category like Events, Beauty, Parenting as they have not been explored much but still quite popular with huge installations. In order to retain the customer base apps should be updated regularly Developers should develop apps such that their content is available for everyone.

Most common category of apps on the Play Store is: Family
Percentage of free apps on the Play Store is: ~92%
Category with the greatest number of app installs on the Play Store is: Communication
Category with the greatest number of reviews is: Communication
Category with the most expensive apps on the Play Store is: Finance

Contributors

Chirag Patil
Yash Deshpande
Atharva Bendre
Shreyas Bhatlawande

Acknowledgements and Resources

Play Store EDA on Kaggle

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Dataset		Dataset
Project Presentation		Project Presentation
Project Report		Project Report
assets		assets
Google Playstore.ipynb		Google Playstore.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Google Play Store EDA

Table of Contents

About The Project

Aim

Description

Getting Started

Prerequisites

Installation

Theory and Approach

Data Cleaning

Checking how many outliers are present and removing them

Removing Outliers

Data visualization

Results and Outcomes

Contributors

Acknowledgements and Resources

About

Releases

Packages

Contributors 2

Languages

Yash-Desh/Google-Playstore-EDA

Folders and files

Latest commit

History

Repository files navigation

Google Play Store EDA

Table of Contents

About The Project

Aim

Description

Getting Started

Prerequisites

Installation

Theory and Approach

Data Cleaning

Checking how many outliers are present and removing them

Removing Outliers

Data visualization

Results and Outcomes

Contributors

Acknowledgements and Resources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages