Skip to content

Obtain meaningful insights from the USA Flights dataset and predict flight delays across multiple airports in the USA. Also, predict and classify types of delays accurately using simple Machine learning algorithms like Random Forest, Lanier regression, and logistic regression. (Lighthouse Labs mid-term project!)

License

Notifications You must be signed in to change notification settings

bnati5/Mid-term-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mid-term-project

Lighthouse Labs mid-term project.

Hello and Welcome!!!

The aim of this project is to predict delays on flights from the first 7 days of 2020 (1st of January - 7th of January).

This repository consists of following files:

  • exploratory_analysis.ipynb: this file contains 10 questions we need to answer during the data exploration phase. They will help us to understand the data and become familiar with different variables.
  • modeling.ipynb: this file contains instructions for modeling part of the project. We recommend to split modeling tasks into more notebooks.
  • data_description.md: if you are looking for any information regarding specific attributes in the data this is the file to look in.
  • sample_submission.csv: this file is an example how the submission for the modeling task should look like.

==================== Exploratory Analysis Notebooks ==================================

============================================================================

================= Data Gathering, Cleaning & Feature Engineering Notebooks ====================

============================================================================

================= Modeling Notebooks =============================================

============================================================================

================= Readme Files and Guide Notebooks====================================

============================================================================

================= Saved Models ==================================================

============================================================================

System Requirements

Operating System: Windows
Programing Language: Python

Please be carefull of the large datasets
Minimum 16GB Ram to run the large multiclasification files.

Data

We will be working with data from air travel industry. We will have four separate tables:

  1. flights: The departure and arrival information about flights in US in years 2018 and 2019.
  2. fuel_comsumption: The fuel comsumption of different airlines from years 2015-2019 aggregated per month.
  3. passengers: The passenger totals on different routes from years 2015-2019 aggregated per month.
  4. flights_test: The departure and arrival information about flights in US in January 2020. This table will be used for evaluation. We are required to predict delays on flights from first 7 days of 2020 (1st of January - 7th of January). We can find sample submission in file sample_submission.csv

About

Obtain meaningful insights from the USA Flights dataset and predict flight delays across multiple airports in the USA. Also, predict and classify types of delays accurately using simple Machine learning algorithms like Random Forest, Lanier regression, and logistic regression. (Lighthouse Labs mid-term project!)

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published