Cardiovascular diseases (CVDs) are the number 1 cause of death globally, taking an estimated 17.9 million lives each year, which accounts for 31% of all deaths worldwide. Four out of 5CVD deaths are due to heart attacks and strokes, and one-third of these deaths occur prematurely in people under 70 years of age. Heart failure is a common event caused by CVDs and this dataset contains 11 features that can be used to predict a possible heart disease.
People with cardiovascular disease or who are at high cardiovascular risk (due to the presence of one or more risk factors such as hypertension, diabetes, hyperlipidaemia or already established disease) need early detection and management wherein a machine learning model can be of great help.
This notebook leverages the dataset in Kaggle to train machine learning models to predict heart disease. Data Preprocessing, Exploratory Data Analysis, model training and visualization are demonstrated in this notebook. The performance of decision tree, random forest, K-nearest neighbour and support vector machine are compared.
Dataset: https://www.kaggle.com/datasets/fedesoriano/heart-failure-prediction/