A machine learning project to predict the prices of used cars based on features like mileage, engine size, fuel type, and car age. This project involves data preprocessing, feature engineering, and training multiple regression models to deliver accurate predictions.
This project aims to create a machine learning pipeline to predict used car prices accurately. It includes data cleaning, feature engineering, and model evaluation, providing insights into the factors influencing car prices.
The dataset contains information on used cars, including their mileage, engine size, fuel type, transmission, and price. Key steps include:
- Cleaning the data (handling missing values and outliers).
- Engineering new features like car age and brand categories.
- Data Cleaning: Handling missing values, outliers, and inconsistencies.
- Feature Engineering: Adding new features (e.g., car age, brand categories) and transforming variables.
- Data Visualization: Analyzing patterns using histograms, scatterplots, and heatmaps.
- Model Building: Training multiple regression models to predict prices.
- Evaluation: Comparing models based on metrics like RMSE and R².
- Random Forest Regressor
- MLP Regressor
- Support Vector Regressor (SVR)
The best model provided robust predictions and captured key patterns in the dataset, demonstrating the importance of factors like mileage and engine size in determining car prices.
- Python
- Libraries: Pandas, NumPy, Scikit-learn, Seaborn, Matplotlib