Skip to content

Advanced analytics involving data visualization and predictions with machine learning regarding to Car Fuel Consumption.

License

Notifications You must be signed in to change notification settings

AndrewBavuels/Car-Fuel-Consumption-Part-I-Analysis-before-Predictions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

37 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Car Fuel Consumption Part I - Analysis before Predictions ๐Ÿš—๐Ÿ“ˆ๐Ÿ”ฎ

This is the first part of a project series to show a case in which I will apply advanced analytics, involving data visualization and predictions with Machine Learning, regarding Car Fuel Consumption.

Let's Get the Party Started

1. Project description ๐Ÿ‘‡

Part I: Exploratory Data Analysis (EDA) Visualization with Python.

For this project, I used a "Car Fuel Consumption per gas fuel type" dataset from Kaggle. From the Kaggle Dataset context, there are 4 questions formulated:

  • Question #1: Which gas type consumes the most? E10 or SP98?
  • Question #2: How much is the consume?
  • Question #3: It consumes 0.4 liters more with E10 gas, isn't it?
  • Question #4: Which of the two fuels is cheaper, E10 or SP 98?

"All these questions are answered in the notebook of this repository."

Driving Fast And Furious GIF by The Fast Saga

Exploratory Data Analysis => Summary:

  • Performing data maintenance or cleaning: Duplicates, null-values, outliers
  • String operations: Normalize to lowercase, replacing numbers to categorical and vice-versa
  • Feature Engineering: Merging features, such as 'consumption rate' based on distance and consumed gas
  • Relational model transformation: For future project steps
  • Answering questions from our Exploratory Data Analysis: The main goal of this repository

Highlights:

Question #1: Which gas type consumes the most? E10 or SP98?

question_1

We can observe the consume rate (liter per kms), regarding the car speed records, is higher for the gas type SP98

Question #2: How much is the consumption?

Code Snippet:

question_2

Stats Overview

gas_type count mean std min 25% 50% 75% max
E10 138.0 0.304058 0.125085 0.04 0.22 0.295 0.3975 0.68
SP98 197.0 0.321421 0.126652 0.09 0.24 0.310 0.4000 0.73

Mean Analysis

gas_type consume_rate speed
E10 0.304058 43.289855
SP98 0.321421 41.497462

Median Analysis

gas_type consume_rate speed
E10 0.295 42.0
SP98 0.310 41.0
  • E10 has a mean consumption rate of 0.304 and a median of 0.295.
  • SP98 has a mean consumption rate of 0.321 and a median of 0.310.

Both gas types show similar patterns in terms of distribution and spread (std), with SP98 having a slightly higher average consumption rate compared to E10.

Question #3: It consumes 0.4 liters more with E10 gas, isn't it?

Here is the reason why I created a relational-ish table df:

Code Snippet:

question_3.1

I am showing in the previous image, the merge of the gas consumption records to the treated dataset, in order to figure out if E10 is consumed 0.4 liters more than SP98. Now, with a groupby, I am getting the mean and the median for such consumptions:

Mean Analysis

gas_type consume
E10 4.781159
SP98 4.668020

Code Snippet:

question_3.2

Median Analysis

gas_type consume
E10 4.7
SP98 4.6

Code Snippet:

question_3.3

It consumes 0.4 liters more with E10 gas, isn't it? The answer is Not necessarily.

It depended on the number of experiments per gas type. That is why I did the Feature Engineering in the first place, to mean creating the column 'consume_rate' to be accurate in telling which gas type consumes the most.

Question #4: Which of the two fuels is cheaper, E10 or SP 98?

From the Kaggle Dataset context:

  • E10 is sold for โ‚ฌ 1.38
  • SP98 is sold for โ‚ฌ 1.46

Code Snippet:

question_4

Comparative Price Analysis between E10 and SP98

In this analysis, we compare the mean and median prices paid for two types of gasoline: E10 and SP98.

Mean Price

The mean price is the average of all prices paid for each type of gasoline. Here are the results:

  • Mean price of E10: 6.598
  • Mean price of SP98: 6.815

This means that, on average, the price paid for E10 is lower than the price paid for SP98.

Is E10 cheaper than SP98? Yes, on average, E10 is cheaper than SP98.

Median Price

The median price is the middle value in a set of data when it is ordered from lowest to highest. Here are the results:

  • Median price of E10: 6.486
  • Median price of SP98: 6.716

This means that if we order all the prices paid for each type of gasoline, the central value for E10 is lower than the central value for SP98.

Is E10 cheaper than SP98? Yes, according to the median prices, E10 is cheaper than SP98.

Conclusion

Both the mean and median price analyses indicate that E10 is generally cheaper than SP98. This information can be useful for consumers looking for more economical options when choosing the type of gasoline for their vehicles.

After this Exploratory Data analysis, the next experiment will be Machine Learning training models for Gas Consumption predictions. We export the 'pre-processed' dataset as follows:

2. Technology stack ๐Ÿ’ป

Programming language:

Python Libraries:

  • pandas: For data manipulation and analysis.
  • numpy: For mathematical operations and array manipulation.
  • matplotlib.pyplot: For data visualization.
  • seaborn: For statistical data visualization.
  • re: For regular expression operations.
  • scipy: For scientific and technical computing.

Distribution platform

Computing environment

3. Folder structure ๐Ÿ“

โ””โ”€โ”€ project
    โ”œโ”€โ”€ data
    โ”‚   โ”œโ”€โ”€ raw
    โ”‚   โ”‚   โ””โ”€โ”€ measurements.csv
    โ”‚   โ””โ”€โ”€ pre_processed
    โ”‚   โ”‚   โ””โ”€โ”€ pre_processed_gas_df.csv
    โ”œโ”€โ”€ notebooks
    โ”‚   โ””โ”€โ”€ main.ipynb
    โ””โ”€โ”€ README.md    

4. Next steps ๐Ÿ’ก

Predictions

  • Do you have any hypothesis?
  • Can you make any kind of prediction: regression and/or classification?

Enriching the dataset

  • Obtain related data by web scraping or with APIs.

Database

  • Load the processed information into a database

Contact info๐Ÿ“ง

For further information, reach me at [email protected]

About

Advanced analytics involving data visualization and predictions with machine learning regarding to Car Fuel Consumption.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published