Capstone Project - Udacity Azure Machine Learning Engineer

This is the final project which is the Capstone in the Udacity Azure Machine Learning Engineer Nanodegree. This project requires the expertise in the Azure Cloud Machine learning technologies. This acts as the final step in practically implementing the knowledge that I have gathered from the nanodegree.

Project Set Up and Installation

This project requires the creation on compute instance to run Jupyter Notebook & compute cluster to run the experiments.
Dataset needs to be manually selected.
Two experiments were run using Auto-ML & HyperDrive
The best model that gave good metrics was deployed and consumed.

Dataset

Name: heart_failure_clinical_records_dataset.csv

Overview

I have downloaded the dataset from "UC Irvine Machine Learning Repository"

Heart failure (HF) occurs when the heart cannot pump enough blood to meet the needs of the body.Available electronic medical records of patients quantify symptoms, body features, and clinical laboratory test values, which can be used to perform biostatistics analysis aimed at highlighting patterns and correlations otherwise undetectable by medical doctors. Machine learning, in particular, can predict patients’ survival from their data and can individuate the most important features among those included in their medical records.

Task

Task: This is a classification problem where in I'm trying to predict if the symptons used in the features will cause death in the patient.(Yes or No) The target variable is "death event"

Thirteen (13) clinical features:

age: age of the patient (years)
anaemia: decrease of red blood cells or hemoglobin (boolean)
high blood pressure: if the patient has hypertension (boolean)
creatinine phosphokinase (CPK): level of the CPK enzyme in the blood (mcg/L)
diabetes: if the patient has diabetes (boolean)
ejection fraction: percentage of blood leaving the heart at each contraction (percentage)
platelets: platelets in the blood (kiloplatelets/mL)
sex: woman or man (binary)
serum creatinine: level of serum creatinine in the blood (mg/dL)
serum sodium: level of serum sodium in the blood (mEq/L)
smoking: if the patient smokes or not (boolean)
time: follow-up period (days)
[target] death event: if the patient deceased during the follow-up period (boolean)

Access

I'm accessing the data from the direct link to the UCI repository in the notebook where in I import the data using TabularDataset library in the Azure.

Automated ML

TODO: Give an overview of the automl settings and configuration you used for this experiment

I created a Compute Instance with specification "STANDARD_D3_V2" to run Jupyter Notebook in Azure.
I have imported the dataset using TabularDataset library.
The setting that I used for Auto-ML were

"experiment_timeout_minutes",
"enable_early_stopping",
"n_cross_validations",
"max_concurrent_iterations"

automl_settings = {"primary_metric":"accuracy", "experiment_timeout_minutes":30, "enable_early_stopping":True, "n_cross_validations":3,"max_concurrent_iterations": 4}

automl_config = AutoMLConfig(compute_target = compute_target, task = 'classification', training_data = train, label_column_name = 'DEATH_EVENT',**automl_settings)

Results

The best performing Algorithm was "VotingEnsemble" with an accuracy of 87.48%
Some of the other parameters as shown in the screenshot are the following:

precision_score_micro 0.8748348348348348
recall_score_macro 0.8428914238803412
norm_macro_recall 0.6857828477606823
AUC_micro 0.9212086031977925

Run Details
The list of the Algorithms that ran are shown in the below screenshot
I can still improve the model performance by increasing the runs, capturing more data, including more features.

Hyperparameter Tuning

I have used LogisticRegression for this experiment since it is easily understandable and works well with Classification problems.
I have used RandomParameterSampling with 3 parameters for this model: solver max_iter C

RandomParameterSampling({'C': choice(0.01, 0.1, 1, 10, 100), 'max_iter' : choice(50,75,100,125,150,175,200), 'solver' : choice('liblinear','sag','lbfgs', 'saga')})

I have used the primary metric as "Accuracy" for this problem and I have tried to maximize it.

Results

TODO: What are the results you got with your model? What were the parameters of the model? How could you have improved it?

The best performing accuracy was 92%
The parameters of the model are: ['--C', '0.1', '--max_iter', '50', '--solver', 'liblinear']
I could increase the number of parameter ranges that I have used. I can even change the method of sampling used for the execution to run faster or slower and find good accurate results.

Best Model Screenshot:

Run Details Screenshot:

Model Deployment

TODO: Give an overview of the deployed model and instructions on how to query the endpoint with a sample input. Since the HyperDrive experiment gave me best metrics i.e Accuracy of 92%, I went ahead and deployed this model.

Deployed Service Screenshot:

Sample input that I provided for the endpoint to get the response: The endpoint needs to receive the sample in the form of JSON, I have displayed the code and sample in the above screenshot for clarity.

Screen Recording

Link: https://youtu.be/sNhuO7utmf0

The sample request to the endpoint is shown at the end of the video.

Remember that the screencast should demonstrate:

A working model
Demo of the deployed model
Demo of a sample request sent to the endpoint and its response

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
screenshots		screenshots
README.md		README.md
automl.ipynb		automl.ipynb
automl.log		automl.log
azureml_automl.log		azureml_automl.log
conda_dependencies.yaml		conda_dependencies.yaml
hyperparameter_tuning.ipynb		hyperparameter_tuning.ipynb
myenv.yml		myenv.yml
score.py		score.py
scoring.py		scoring.py
train.csv		train.csv
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Capstone Project - Udacity Azure Machine Learning Engineer

Project Set Up and Installation

Dataset

Overview

Task

Access

Automated ML

Results

Hyperparameter Tuning

Results

Model Deployment

Screen Recording

About

Releases

Packages

Languages

bancroftway/udacity-capstone

Folders and files

Latest commit

History

Repository files navigation

Capstone Project - Udacity Azure Machine Learning Engineer

Project Set Up and Installation

Dataset

Overview

Task

Access

Automated ML

Results

Hyperparameter Tuning

Results

Model Deployment

Screen Recording

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages