diff --git a/ML Algorithms/Parkinson Disease Detection using SVM.ipynb b/ML Algorithms/Parkinson Disease Detection using SVM.ipynb new file mode 100644 index 000000000..046f52e7c --- /dev/null +++ b/ML Algorithms/Parkinson Disease Detection using SVM.ipynb @@ -0,0 +1,1678 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Parkinson's Disease Detection using SVMs" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Building a system that can detect Parkinson's disease in a patient depending upon certain medical procedures" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Parkinson's disease- \n", + "It is a progressive nervous system disorder that affects movement leading to shaking, stiffness, and difficulty with walking, balance, and coordination. Parkinson's symptoms usually begin gradually and get worse over time" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Workflow-\n", + " Parkinson's Data ----> Data pre processing ----> Train Test and Split ----> Support Vector Machine Classifier\n", + " \n", + " New Data ----> Trained SVM classifier ----> Prediction of disease" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Importing the Dependencies" + ] + }, + { + "cell_type": "code", + "execution_count": 83, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np \n", + "import pandas as pd \n", + "import matplotlib.pyplot as plt \n", + "import seaborn as sns\n", + "from sklearn.model_selection import train_test_split, GridSearchCV\n", + "from sklearn.preprocessing import StandardScaler #stand data in common fram\n", + "from sklearn import svm #model\n", + "from sklearn.metrics import accuracy_score #For evaluation\n", + "from sklearn.metrics import classification_report, confusion_matrix" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### About Dataset used-\n", + " I have used Parkinson's dataset that is given on kaggle.com site and freely available.\n", + " \n", + " About dataset-\n", + " This dataset is composed of a range of biomedical voice measurements from 31 people, 23 with Parkinson's disease (PD). \n", + " Each column in the table is a particular voice measure, and each row corresponds one of 195 voice recording from \n", + " these individuals (\"name\" column). The main aim of the data is to discriminate healthy people from those with PD, \n", + " according to \"status\" column which is set to 0 for healthy and 1 for PD.\n", + " \n", + " Attribute Information:\n", + "\n", + " Matrix column entries (attributes):\n", + " name - ASCII subject name and recording number\n", + " MDVP:Fo(Hz) - Average vocal fundamental frequency\n", + " MDVP:Fhi(Hz) - Maximum vocal fundamental frequency\n", + " MDVP:Flo(Hz) - Minimum vocal fundamental frequency\n", + " MDVP:Jitter(%),MDVP:Jitter(Abs),MDVP:RAP,MDVP:PPQ,Jitter:DDP - Several\n", + " measures of variation in fundamental frequency\n", + " MDVP:Shimmer,MDVP:Shimmer(dB),Shimmer:APQ3,Shimmer:APQ5,MDVP:APQ,Shimmer:DDA - Several measures of variation in amplitude\n", + " NHR,HNR - Two measures of ratio of noise to tonal components in the voice\n", + " status - Health status of the subject (one) - Parkinson's, (zero) - healthy\n", + " RPDE,D2 - Two nonlinear dynamical complexity measures\n", + " DFA - Signal fractal scaling exponent\n", + " spread1,spread2,PPE - Three nonlinear measures of fundamental frequency variation" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Data Collection and Analysis" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "#Loading our dataset\n", + "parkinsons_data = pd.read_csv('parkinsons.csv')" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
nameMDVP:Fo(Hz)MDVP:Fhi(Hz)MDVP:Flo(Hz)MDVP:Jitter(%)MDVP:Jitter(Abs)MDVP:RAPMDVP:PPQJitter:DDPMDVP:Shimmer...Shimmer:DDANHRHNRstatusRPDEDFAspread1spread2D2PPE
0phon_R01_S01_1119.992157.30274.9970.007840.000070.003700.005540.011090.04374...0.065450.0221121.03310.4147830.815285-4.8130310.2664822.3014420.284654
1phon_R01_S01_2122.400148.650113.8190.009680.000080.004650.006960.013940.06134...0.094030.0192919.08510.4583590.819521-4.0751920.3355902.4868550.368674
2phon_R01_S01_3116.682131.111111.5550.010500.000090.005440.007810.016330.05233...0.082700.0130920.65110.4298950.825288-4.4431790.3111732.3422590.332634
3phon_R01_S01_4116.676137.871111.3660.009970.000090.005020.006980.015050.05492...0.087710.0135320.64410.4349690.819235-4.1175010.3341472.4055540.368975
4phon_R01_S01_5116.014141.781110.6550.012840.000110.006550.009080.019660.06425...0.104700.0176719.64910.4173560.823484-3.7477870.2345132.3321800.410335
\n", + "

5 rows × 24 columns

\n", + "
" + ], + "text/plain": [ + " name MDVP:Fo(Hz) MDVP:Fhi(Hz) MDVP:Flo(Hz) MDVP:Jitter(%) \\\n", + "0 phon_R01_S01_1 119.992 157.302 74.997 0.00784 \n", + "1 phon_R01_S01_2 122.400 148.650 113.819 0.00968 \n", + "2 phon_R01_S01_3 116.682 131.111 111.555 0.01050 \n", + "3 phon_R01_S01_4 116.676 137.871 111.366 0.00997 \n", + "4 phon_R01_S01_5 116.014 141.781 110.655 0.01284 \n", + "\n", + " MDVP:Jitter(Abs) MDVP:RAP MDVP:PPQ Jitter:DDP MDVP:Shimmer ... \\\n", + "0 0.00007 0.00370 0.00554 0.01109 0.04374 ... \n", + "1 0.00008 0.00465 0.00696 0.01394 0.06134 ... \n", + "2 0.00009 0.00544 0.00781 0.01633 0.05233 ... \n", + "3 0.00009 0.00502 0.00698 0.01505 0.05492 ... \n", + "4 0.00011 0.00655 0.00908 0.01966 0.06425 ... \n", + "\n", + " Shimmer:DDA NHR HNR status RPDE DFA spread1 \\\n", + "0 0.06545 0.02211 21.033 1 0.414783 0.815285 -4.813031 \n", + "1 0.09403 0.01929 19.085 1 0.458359 0.819521 -4.075192 \n", + "2 0.08270 0.01309 20.651 1 0.429895 0.825288 -4.443179 \n", + "3 0.08771 0.01353 20.644 1 0.434969 0.819235 -4.117501 \n", + "4 0.10470 0.01767 19.649 1 0.417356 0.823484 -3.747787 \n", + "\n", + " spread2 D2 PPE \n", + "0 0.266482 2.301442 0.284654 \n", + "1 0.335590 2.486855 0.368674 \n", + "2 0.311173 2.342259 0.332634 \n", + "3 0.334147 2.405554 0.368975 \n", + "4 0.234513 2.332180 0.410335 \n", + "\n", + "[5 rows x 24 columns]" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#printing first 5 rows of our dataset\n", + "parkinsons_data.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(195, 24)" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#number of rows and columns in our dataframe\n", + "parkinsons_data.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 195 entries, 0 to 194\n", + "Data columns (total 24 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 name 195 non-null object \n", + " 1 MDVP:Fo(Hz) 195 non-null float64\n", + " 2 MDVP:Fhi(Hz) 195 non-null float64\n", + " 3 MDVP:Flo(Hz) 195 non-null float64\n", + " 4 MDVP:Jitter(%) 195 non-null float64\n", + " 5 MDVP:Jitter(Abs) 195 non-null float64\n", + " 6 MDVP:RAP 195 non-null float64\n", + " 7 MDVP:PPQ 195 non-null float64\n", + " 8 Jitter:DDP 195 non-null float64\n", + " 9 MDVP:Shimmer 195 non-null float64\n", + " 10 MDVP:Shimmer(dB) 195 non-null float64\n", + " 11 Shimmer:APQ3 195 non-null float64\n", + " 12 Shimmer:APQ5 195 non-null float64\n", + " 13 MDVP:APQ 195 non-null float64\n", + " 14 Shimmer:DDA 195 non-null float64\n", + " 15 NHR 195 non-null float64\n", + " 16 HNR 195 non-null float64\n", + " 17 status 195 non-null int64 \n", + " 18 RPDE 195 non-null float64\n", + " 19 DFA 195 non-null float64\n", + " 20 spread1 195 non-null float64\n", + " 21 spread2 195 non-null float64\n", + " 22 D2 195 non-null float64\n", + " 23 PPE 195 non-null float64\n", + "dtypes: float64(22), int64(1), object(1)\n", + "memory usage: 36.7+ KB\n" + ] + } + ], + "source": [ + "# getting more informatiton about the dataset\n", + "parkinsons_data.info()" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "name 0\n", + "MDVP:Fo(Hz) 0\n", + "MDVP:Fhi(Hz) 0\n", + "MDVP:Flo(Hz) 0\n", + "MDVP:Jitter(%) 0\n", + "MDVP:Jitter(Abs) 0\n", + "MDVP:RAP 0\n", + "MDVP:PPQ 0\n", + "Jitter:DDP 0\n", + "MDVP:Shimmer 0\n", + "MDVP:Shimmer(dB) 0\n", + "Shimmer:APQ3 0\n", + "Shimmer:APQ5 0\n", + "MDVP:APQ 0\n", + "Shimmer:DDA 0\n", + "NHR 0\n", + "HNR 0\n", + "status 0\n", + "RPDE 0\n", + "DFA 0\n", + "spread1 0\n", + "spread2 0\n", + "D2 0\n", + "PPE 0\n", + "dtype: int64" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# checking for missing values in each columns\n", + "parkinsons_data.isnull().sum()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### After this step it is very clear that our dataset didn't contain any missing values that means it is already processed" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
MDVP:Fo(Hz)MDVP:Fhi(Hz)MDVP:Flo(Hz)MDVP:Jitter(%)MDVP:Jitter(Abs)MDVP:RAPMDVP:PPQJitter:DDPMDVP:ShimmerMDVP:Shimmer(dB)...Shimmer:DDANHRHNRstatusRPDEDFAspread1spread2D2PPE
count195.000000195.000000195.000000195.000000195.000000195.000000195.000000195.000000195.000000195.000000...195.000000195.000000195.000000195.000000195.000000195.000000195.000000195.000000195.000000195.000000
mean154.228641197.104918116.3246310.0062200.0000440.0033060.0034460.0099200.0297090.282251...0.0469930.02484721.8859740.7538460.4985360.718099-5.6843970.2265102.3818260.206552
std41.39006591.49154843.5214130.0048480.0000350.0029680.0027590.0089030.0188570.194877...0.0304590.0404184.4257640.4318780.1039420.0553361.0902080.0834060.3827990.090119
min88.333000102.14500065.4760000.0016800.0000070.0006800.0009200.0020400.0095400.085000...0.0136400.0006508.4410000.0000000.2565700.574282-7.9649840.0062741.4232870.044539
25%117.572000134.86250084.2910000.0034600.0000200.0016600.0018600.0049850.0165050.148500...0.0247350.00592519.1980001.0000000.4213060.674758-6.4500960.1743512.0991250.137451
50%148.790000175.829000104.3150000.0049400.0000300.0025000.0026900.0074900.0229700.221000...0.0383600.01166022.0850001.0000000.4959540.722254-5.7208680.2188852.3615320.194052
75%182.769000224.205500140.0185000.0073650.0000600.0038350.0039550.0115050.0378850.350000...0.0607950.02564025.0755001.0000000.5875620.761881-5.0461920.2792342.6364560.252980
max260.105000592.030000239.1700000.0331600.0002600.0214400.0195800.0643300.1190801.302000...0.1694200.31482033.0470001.0000000.6851510.825288-2.4340310.4504933.6711550.527367
\n", + "

8 rows × 23 columns

\n", + "
" + ], + "text/plain": [ + " MDVP:Fo(Hz) MDVP:Fhi(Hz) MDVP:Flo(Hz) MDVP:Jitter(%) \\\n", + "count 195.000000 195.000000 195.000000 195.000000 \n", + "mean 154.228641 197.104918 116.324631 0.006220 \n", + "std 41.390065 91.491548 43.521413 0.004848 \n", + "min 88.333000 102.145000 65.476000 0.001680 \n", + "25% 117.572000 134.862500 84.291000 0.003460 \n", + "50% 148.790000 175.829000 104.315000 0.004940 \n", + "75% 182.769000 224.205500 140.018500 0.007365 \n", + "max 260.105000 592.030000 239.170000 0.033160 \n", + "\n", + " MDVP:Jitter(Abs) MDVP:RAP MDVP:PPQ Jitter:DDP MDVP:Shimmer \\\n", + "count 195.000000 195.000000 195.000000 195.000000 195.000000 \n", + "mean 0.000044 0.003306 0.003446 0.009920 0.029709 \n", + "std 0.000035 0.002968 0.002759 0.008903 0.018857 \n", + "min 0.000007 0.000680 0.000920 0.002040 0.009540 \n", + "25% 0.000020 0.001660 0.001860 0.004985 0.016505 \n", + "50% 0.000030 0.002500 0.002690 0.007490 0.022970 \n", + "75% 0.000060 0.003835 0.003955 0.011505 0.037885 \n", + "max 0.000260 0.021440 0.019580 0.064330 0.119080 \n", + "\n", + " MDVP:Shimmer(dB) ... Shimmer:DDA NHR HNR status \\\n", + "count 195.000000 ... 195.000000 195.000000 195.000000 195.000000 \n", + "mean 0.282251 ... 0.046993 0.024847 21.885974 0.753846 \n", + "std 0.194877 ... 0.030459 0.040418 4.425764 0.431878 \n", + "min 0.085000 ... 0.013640 0.000650 8.441000 0.000000 \n", + "25% 0.148500 ... 0.024735 0.005925 19.198000 1.000000 \n", + "50% 0.221000 ... 0.038360 0.011660 22.085000 1.000000 \n", + "75% 0.350000 ... 0.060795 0.025640 25.075500 1.000000 \n", + "max 1.302000 ... 0.169420 0.314820 33.047000 1.000000 \n", + "\n", + " RPDE DFA spread1 spread2 D2 PPE \n", + "count 195.000000 195.000000 195.000000 195.000000 195.000000 195.000000 \n", + "mean 0.498536 0.718099 -5.684397 0.226510 2.381826 0.206552 \n", + "std 0.103942 0.055336 1.090208 0.083406 0.382799 0.090119 \n", + "min 0.256570 0.574282 -7.964984 0.006274 1.423287 0.044539 \n", + "25% 0.421306 0.674758 -6.450096 0.174351 2.099125 0.137451 \n", + "50% 0.495954 0.722254 -5.720868 0.218885 2.361532 0.194052 \n", + "75% 0.587562 0.761881 -5.046192 0.279234 2.636456 0.252980 \n", + "max 0.685151 0.825288 -2.434031 0.450493 3.671155 0.527367 \n", + "\n", + "[8 rows x 23 columns]" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Getting some statistical measures about the data\n", + "parkinsons_data.describe()" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1 147\n", + "0 48\n", + "Name: status, dtype: int64" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Distribution of Target i.e status of a person\n", + "parkinsons_data['status'].value_counts()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Status=1 -> Parkinson's Positive\n", + "#### Status=0 -> Healthy" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
MDVP:Fo(Hz)MDVP:Fhi(Hz)MDVP:Flo(Hz)MDVP:Jitter(%)MDVP:Jitter(Abs)MDVP:RAPMDVP:PPQJitter:DDPMDVP:ShimmerMDVP:Shimmer(dB)...MDVP:APQShimmer:DDANHRHNRRPDEDFAspread1spread2D2PPE
status
0181.937771223.636750145.2072920.0038660.0000230.0019250.0020560.0057760.0176150.162958...0.0133050.0285110.01148324.6787500.4425520.695716-6.7592640.1602922.1544910.123017
1145.180762188.441463106.8935580.0069890.0000510.0037570.0039000.0112730.0336580.321204...0.0276000.0530270.02921120.9740480.5168160.725408-5.3334200.2481332.4560580.233828
\n", + "

2 rows × 22 columns

\n", + "
" + ], + "text/plain": [ + " MDVP:Fo(Hz) MDVP:Fhi(Hz) MDVP:Flo(Hz) MDVP:Jitter(%) \\\n", + "status \n", + "0 181.937771 223.636750 145.207292 0.003866 \n", + "1 145.180762 188.441463 106.893558 0.006989 \n", + "\n", + " MDVP:Jitter(Abs) MDVP:RAP MDVP:PPQ Jitter:DDP MDVP:Shimmer \\\n", + "status \n", + "0 0.000023 0.001925 0.002056 0.005776 0.017615 \n", + "1 0.000051 0.003757 0.003900 0.011273 0.033658 \n", + "\n", + " MDVP:Shimmer(dB) ... MDVP:APQ Shimmer:DDA NHR HNR \\\n", + "status ... \n", + "0 0.162958 ... 0.013305 0.028511 0.011483 24.678750 \n", + "1 0.321204 ... 0.027600 0.053027 0.029211 20.974048 \n", + "\n", + " RPDE DFA spread1 spread2 D2 PPE \n", + "status \n", + "0 0.442552 0.695716 -6.759264 0.160292 2.154491 0.123017 \n", + "1 0.516816 0.725408 -5.333420 0.248133 2.456058 0.233828 \n", + "\n", + "[2 rows x 22 columns]" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Grouping the data based on the target variable\n", + "parkinsons_data.groupby('status').mean()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### By naive eyes we can easily see that for ex- if MDVP:Fo(Hz) value is above 180 that means person is very less likely to be Parkinson positive and if this value is below 150 than person is more likely to be Parkinson positive. Further we will try to make it show using our SVM model " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Exploratory data analysis\n", + " We will use heatmap for our purpose.\n", + " To represent more common values or higher activitiesBrighter colors basically reddish colors are used and to represent \n", + " less common or activity values, darker colors are preferred. " + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "df= parkinsons_data\n", + "sns.heatmap(data=df.drop(columns=['name', 'status']))\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Data Pre-Processing" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [], + "source": [ + "#Seperating the features and target\n", + "X = parkinsons_data.drop(columns=['name', 'status'], axis=1) #axis=1 because we are dropping column, for dropping row axis=0\n", + "Y = parkinsons_data['status']" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " MDVP:Fo(Hz) MDVP:Fhi(Hz) MDVP:Flo(Hz) MDVP:Jitter(%) \\\n", + "0 119.992 157.302 74.997 0.00784 \n", + "1 122.400 148.650 113.819 0.00968 \n", + "2 116.682 131.111 111.555 0.01050 \n", + "3 116.676 137.871 111.366 0.00997 \n", + "4 116.014 141.781 110.655 0.01284 \n", + ".. ... ... ... ... \n", + "190 174.188 230.978 94.261 0.00459 \n", + "191 209.516 253.017 89.488 0.00564 \n", + "192 174.688 240.005 74.287 0.01360 \n", + "193 198.764 396.961 74.904 0.00740 \n", + "194 214.289 260.277 77.973 0.00567 \n", + "\n", + " MDVP:Jitter(Abs) MDVP:RAP MDVP:PPQ Jitter:DDP MDVP:Shimmer \\\n", + "0 0.00007 0.00370 0.00554 0.01109 0.04374 \n", + "1 0.00008 0.00465 0.00696 0.01394 0.06134 \n", + "2 0.00009 0.00544 0.00781 0.01633 0.05233 \n", + "3 0.00009 0.00502 0.00698 0.01505 0.05492 \n", + "4 0.00011 0.00655 0.00908 0.01966 0.06425 \n", + ".. ... ... ... ... ... \n", + "190 0.00003 0.00263 0.00259 0.00790 0.04087 \n", + "191 0.00003 0.00331 0.00292 0.00994 0.02751 \n", + "192 0.00008 0.00624 0.00564 0.01873 0.02308 \n", + "193 0.00004 0.00370 0.00390 0.01109 0.02296 \n", + "194 0.00003 0.00295 0.00317 0.00885 0.01884 \n", + "\n", + " MDVP:Shimmer(dB) ... MDVP:APQ Shimmer:DDA NHR HNR RPDE \\\n", + "0 0.426 ... 0.02971 0.06545 0.02211 21.033 0.414783 \n", + "1 0.626 ... 0.04368 0.09403 0.01929 19.085 0.458359 \n", + "2 0.482 ... 0.03590 0.08270 0.01309 20.651 0.429895 \n", + "3 0.517 ... 0.03772 0.08771 0.01353 20.644 0.434969 \n", + "4 0.584 ... 0.04465 0.10470 0.01767 19.649 0.417356 \n", + ".. ... ... ... ... ... ... ... \n", + "190 0.405 ... 0.02745 0.07008 0.02764 19.517 0.448439 \n", + "191 0.263 ... 0.01879 0.04812 0.01810 19.147 0.431674 \n", + "192 0.256 ... 0.01667 0.03804 0.10715 17.883 0.407567 \n", + "193 0.241 ... 0.01588 0.03794 0.07223 19.020 0.451221 \n", + "194 0.190 ... 0.01373 0.03078 0.04398 21.209 0.462803 \n", + "\n", + " DFA spread1 spread2 D2 PPE \n", + "0 0.815285 -4.813031 0.266482 2.301442 0.284654 \n", + "1 0.819521 -4.075192 0.335590 2.486855 0.368674 \n", + "2 0.825288 -4.443179 0.311173 2.342259 0.332634 \n", + "3 0.819235 -4.117501 0.334147 2.405554 0.368975 \n", + "4 0.823484 -3.747787 0.234513 2.332180 0.410335 \n", + ".. ... ... ... ... ... \n", + "190 0.657899 -6.538586 0.121952 2.657476 0.133050 \n", + "191 0.683244 -6.195325 0.129303 2.784312 0.168895 \n", + "192 0.655683 -6.787197 0.158453 2.679772 0.131728 \n", + "193 0.643956 -6.744577 0.207454 2.138608 0.123306 \n", + "194 0.664357 -5.724056 0.190667 2.555477 0.148569 \n", + "\n", + "[195 rows x 22 columns]\n" + ] + } + ], + "source": [ + "print(X)" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0 1\n", + "1 1\n", + "2 1\n", + "3 1\n", + "4 1\n", + " ..\n", + "190 0\n", + "191 0\n", + "192 0\n", + "193 0\n", + "194 0\n", + "Name: status, Length: 195, dtype: int64\n" + ] + } + ], + "source": [ + "print(Y)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Splitting the data to Training data and Test data" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [], + "source": [ + "# Taking 20% of data as test data and rest 80% as training data\n", + "X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=2)" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(195, 22) (156, 22) (39, 22)\n" + ] + } + ], + "source": [ + "print(X.shape, X_train.shape, X_test.shape)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Data Standardization\n", + " It is always advisable to bring all features to same scale for applying distance based algorithms " + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [], + "source": [ + "scaler= StandardScaler()" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "StandardScaler()" + ] + }, + "execution_count": 36, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "scaler.fit(X_train)" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [], + "source": [ + "#tranform function to convert our data on new same scale\n", + "X_train = scaler.transform(X_train)\n", + "X_test = scaler.transform(X_test)" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[[ 0.63239631 -0.02731081 -0.87985049 ... -0.97586547 -0.55160318\n", + " 0.07769494]\n", + " [-1.05512719 -0.83337041 -0.9284778 ... 0.3981808 -0.61014073\n", + " 0.39291782]\n", + " [ 0.02996187 -0.29531068 -1.12211107 ... -0.43937044 -0.62849605\n", + " -0.50948408]\n", + " ...\n", + " [-0.9096785 -0.6637302 -0.160638 ... 1.22001022 -0.47404629\n", + " -0.2159482 ]\n", + " [-0.35977689 0.19731822 -0.79063679 ... -0.17896029 -0.47272835\n", + " 0.28181221]\n", + " [ 1.01957066 0.19922317 -0.61914972 ... -0.716232 1.23632066\n", + " -0.05829386]]\n" + ] + } + ], + "source": [ + "print(X_train)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Model Training" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "###### Support Vector Machine Model\n", + " SVM models can be used for both classification and regression problems, but in our case we are using it for\n", + " classification of whether a person is PD positive or negative" + ] + }, + { + "cell_type": "code", + "execution_count": 53, + "metadata": {}, + "outputs": [], + "source": [ + "model = svm.SVC()" + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "SVC()" + ] + }, + "execution_count": 54, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Training the SVM model with training data\n", + "model.fit(X_train, Y_train)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Model Evaluation" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "###### Accuracy Score" + ] + }, + { + "cell_type": "code", + "execution_count": 55, + "metadata": {}, + "outputs": [], + "source": [ + "# Accuracy score of training data\n", + "X_train_prediction = model.predict(X_train)\n", + "training_data_accuracy = accuracy_score(Y_train, X_train_prediction)" + ] + }, + { + "cell_type": "code", + "execution_count": 56, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Accuracy Score of Training Data : 0.9166666666666666\n" + ] + } + ], + "source": [ + "print('Accuracy Score of Training Data : ', training_data_accuracy)" + ] + }, + { + "cell_type": "code", + "execution_count": 57, + "metadata": {}, + "outputs": [], + "source": [ + "# Accuracy score of testing data\n", + "X_test_prediction = model.predict(X_test)\n", + "test_data_accuracy = accuracy_score(Y_test, X_test_prediction)" + ] + }, + { + "cell_type": "code", + "execution_count": 58, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Accuracy Score of Test Data : 0.8974358974358975\n" + ] + } + ], + "source": [ + "print('Accuracy Score of Test Data : ', test_data_accuracy)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Note:\n", + " Since the accuracy score of both testing and training data are approximately similar that means our model is working fine and not overfitted" + ] + }, + { + "cell_type": "code", + "execution_count": 82, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Confusion Matrix : \n", + "\n", + "[[ 4 4]\n", + " [ 0 31]]\n", + "\n", + "Classification Report : \n", + "\n", + " precision recall f1-score support\n", + "\n", + " 0 1.00 0.50 0.67 8\n", + " 1 0.89 1.00 0.94 31\n", + "\n", + " accuracy 0.90 39\n", + " macro avg 0.94 0.75 0.80 39\n", + "weighted avg 0.91 0.90 0.88 39\n", + "\n" + ] + } + ], + "source": [ + "#Printing the results using Classification_report and Confusion_matrix for more clearity\n", + "print(\"Confusion Matrix : \")\n", + "print()\n", + "print(confusion_matrix(Y_test, X_test_prediction))\n", + "print()\n", + "print(\"Classification Report : \")\n", + "print()\n", + "print(classification_report(Y_test, X_test_prediction))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Printing Results using Grid Search CV- (Optional)\n", + " It is used to select most appropiate hyperparameter for our SVC() model, ehich could give most better results with \n", + " highest accuracy.\n", + " Finding right parameters (like what C or gamma values to use) is a difficult task (if hit and trial one by one).\n", + " Therfore we use the idea of creating a grid of parameters and this idea of just trying all possible combinations is \n", + " called Grid Search\n" + ] + }, + { + "cell_type": "code", + "execution_count": 86, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Fitting 5 folds for each of 16 candidates, totalling 80 fits\n", + "[CV] C=0.1, gamma=1 ..................................................\n", + "[CV] ................................... C=0.1, gamma=1, total= 0.0s\n", + "[CV] C=0.1, gamma=1 ..................................................\n", + "[CV] ................................... C=0.1, gamma=1, total= 0.0s\n", + "[CV] C=0.1, gamma=1 ..................................................\n", + "[CV] ................................... C=0.1, gamma=1, total= 0.0s\n", + "[CV] C=0.1, gamma=1 ..................................................\n", + "[CV] ................................... C=0.1, gamma=1, total= 0.0s\n", + "[CV] C=0.1, gamma=1 ..................................................\n", + "[CV] ................................... C=0.1, gamma=1, total= 0.0s\n", + "[CV] C=0.1, gamma=0.1 ................................................\n", + "[CV] ................................. C=0.1, gamma=0.1, total= 0.0s\n", + "[CV] C=0.1, gamma=0.1 ................................................\n", + "[CV] ................................. C=0.1, gamma=0.1, total= 0.0s\n", + "[CV] C=0.1, gamma=0.1 ................................................\n", + "[CV] ................................. C=0.1, gamma=0.1, total= 0.0s\n", + "[CV] C=0.1, gamma=0.1 ................................................\n", + "[CV] ................................. C=0.1, gamma=0.1, total= 0.0s\n", + "[CV] C=0.1, gamma=0.1 ................................................\n", + "[CV] ................................. C=0.1, gamma=0.1, total= 0.0s\n", + "[CV] C=0.1, gamma=0.01 ...............................................\n", + "[CV] ................................ C=0.1, gamma=0.01, total= 0.0s\n", + "[CV] C=0.1, gamma=0.01 ...............................................\n", + "[CV] ................................ C=0.1, gamma=0.01, total= 0.0s\n", + "[CV] C=0.1, gamma=0.01 ...............................................\n", + "[CV] ................................ C=0.1, gamma=0.01, total= 0.0s\n", + "[CV] C=0.1, gamma=0.01 ...............................................\n", + "[CV] ................................ C=0.1, gamma=0.01, total= 0.0s\n", + "[CV] C=0.1, gamma=0.01 ...............................................\n", + "[CV] ................................ C=0.1, gamma=0.01, total= 0.0s\n", + "[CV] C=0.1, gamma=0.001 ..............................................\n", + "[CV] ............................... C=0.1, gamma=0.001, total= 0.0s\n", + "[CV] C=0.1, gamma=0.001 ..............................................\n", + "[CV] ............................... C=0.1, gamma=0.001, total= 0.0s\n", + "[CV] C=0.1, gamma=0.001 ..............................................\n", + "[CV] ............................... C=0.1, gamma=0.001, total= 0.0s\n", + "[CV] C=0.1, gamma=0.001 ..............................................\n", + "[CV] ............................... C=0.1, gamma=0.001, total= 0.0s\n", + "[CV] C=0.1, gamma=0.001 ..............................................\n", + "[CV] ............................... C=0.1, gamma=0.001, total= 0.0s\n", + "[CV] C=1, gamma=1 ....................................................\n", + "[CV] ..................................... C=1, gamma=1, total= 0.0s\n", + "[CV] C=1, gamma=1 ....................................................\n", + "[CV] ..................................... C=1, gamma=1, total= 0.0s\n", + "[CV] C=1, gamma=1 ....................................................\n", + "[CV] ..................................... C=1, gamma=1, total= 0.0s\n", + "[CV] C=1, gamma=1 ....................................................\n", + "[CV] ..................................... C=1, gamma=1, total= 0.0s\n", + "[CV] C=1, gamma=1 ....................................................\n", + "[CV] ..................................... C=1, gamma=1, total= 0.0s\n", + "[CV] C=1, gamma=0.1 ..................................................\n", + "[CV] ................................... C=1, gamma=0.1, total= 0.0s\n", + "[CV] C=1, gamma=0.1 ..................................................\n", + "[CV] ................................... C=1, gamma=0.1, total= 0.0s\n", + "[CV] C=1, gamma=0.1 ..................................................\n", + "[CV] ................................... C=1, gamma=0.1, total= 0.0s\n", + "[CV] C=1, gamma=0.1 ..................................................\n", + "[CV] ................................... C=1, gamma=0.1, total= 0.0s\n", + "[CV] C=1, gamma=0.1 ..................................................\n", + "[CV] ................................... C=1, gamma=0.1, total= 0.0s\n", + "[CV] C=1, gamma=0.01 .................................................\n", + "[CV] .................................. C=1, gamma=0.01, total= 0.0s\n", + "[CV] C=1, gamma=0.01 .................................................\n", + "[CV] .................................. C=1, gamma=0.01, total= 0.0s\n", + "[CV] C=1, gamma=0.01 .................................................\n", + "[CV] .................................. C=1, gamma=0.01, total= 0.0s\n", + "[CV] C=1, gamma=0.01 .................................................\n", + "[CV] .................................. C=1, gamma=0.01, total= 0.0s\n", + "[CV] C=1, gamma=0.01 .................................................\n", + "[CV] .................................. C=1, gamma=0.01, total= 0.0s\n", + "[CV] C=1, gamma=0.001 ................................................\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", + "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[CV] ................................. C=1, gamma=0.001, total= 0.0s\n", + "[CV] C=1, gamma=0.001 ................................................\n", + "[CV] ................................. C=1, gamma=0.001, total= 0.0s\n", + "[CV] C=1, gamma=0.001 ................................................\n", + "[CV] ................................. C=1, gamma=0.001, total= 0.0s\n", + "[CV] C=1, gamma=0.001 ................................................\n", + "[CV] ................................. C=1, gamma=0.001, total= 0.0s\n", + "[CV] C=1, gamma=0.001 ................................................\n", + "[CV] ................................. C=1, gamma=0.001, total= 0.0s\n", + "[CV] C=10, gamma=1 ...................................................\n", + "[CV] .................................... C=10, gamma=1, total= 0.0s\n", + "[CV] C=10, gamma=1 ...................................................\n", + "[CV] .................................... C=10, gamma=1, total= 0.0s\n", + "[CV] C=10, gamma=1 ...................................................\n", + "[CV] .................................... C=10, gamma=1, total= 0.0s\n", + "[CV] C=10, gamma=1 ...................................................\n", + "[CV] .................................... C=10, gamma=1, total= 0.0s\n", + "[CV] C=10, gamma=1 ...................................................\n", + "[CV] .................................... C=10, gamma=1, total= 0.0s\n", + "[CV] C=10, gamma=0.1 .................................................\n", + "[CV] .................................. C=10, gamma=0.1, total= 0.0s\n", + "[CV] C=10, gamma=0.1 .................................................\n", + "[CV] .................................. C=10, gamma=0.1, total= 0.0s\n", + "[CV] C=10, gamma=0.1 .................................................\n", + "[CV] .................................. C=10, gamma=0.1, total= 0.0s\n", + "[CV] C=10, gamma=0.1 .................................................\n", + "[CV] .................................. C=10, gamma=0.1, total= 0.0s\n", + "[CV] C=10, gamma=0.1 .................................................\n", + "[CV] .................................. C=10, gamma=0.1, total= 0.0s\n", + "[CV] C=10, gamma=0.01 ................................................\n", + "[CV] ................................. C=10, gamma=0.01, total= 0.0s\n", + "[CV] C=10, gamma=0.01 ................................................\n", + "[CV] ................................. C=10, gamma=0.01, total= 0.0s\n", + "[CV] C=10, gamma=0.01 ................................................\n", + "[CV] ................................. C=10, gamma=0.01, total= 0.0s\n", + "[CV] C=10, gamma=0.01 ................................................\n", + "[CV] ................................. C=10, gamma=0.01, total= 0.0s\n", + "[CV] C=10, gamma=0.01 ................................................\n", + "[CV] ................................. C=10, gamma=0.01, total= 0.0s\n", + "[CV] C=10, gamma=0.001 ...............................................\n", + "[CV] ................................ C=10, gamma=0.001, total= 0.0s\n", + "[CV] C=10, gamma=0.001 ...............................................\n", + "[CV] ................................ C=10, gamma=0.001, total= 0.0s\n", + "[CV] C=10, gamma=0.001 ...............................................\n", + "[CV] ................................ C=10, gamma=0.001, total= 0.0s\n", + "[CV] C=10, gamma=0.001 ...............................................\n", + "[CV] ................................ C=10, gamma=0.001, total= 0.0s\n", + "[CV] C=10, gamma=0.001 ...............................................\n", + "[CV] ................................ C=10, gamma=0.001, total= 0.0s\n", + "[CV] C=100, gamma=1 ..................................................\n", + "[CV] ................................... C=100, gamma=1, total= 0.0s\n", + "[CV] C=100, gamma=1 ..................................................\n", + "[CV] ................................... C=100, gamma=1, total= 0.0s\n", + "[CV] C=100, gamma=1 ..................................................\n", + "[CV] ................................... C=100, gamma=1, total= 0.0s\n", + "[CV] C=100, gamma=1 ..................................................\n", + "[CV] ................................... C=100, gamma=1, total= 0.0s\n", + "[CV] C=100, gamma=1 ..................................................\n", + "[CV] ................................... C=100, gamma=1, total= 0.0s\n", + "[CV] C=100, gamma=0.1 ................................................\n", + "[CV] ................................. C=100, gamma=0.1, total= 0.0s\n", + "[CV] C=100, gamma=0.1 ................................................\n", + "[CV] ................................. C=100, gamma=0.1, total= 0.0s\n", + "[CV] C=100, gamma=0.1 ................................................\n", + "[CV] ................................. C=100, gamma=0.1, total= 0.0s\n", + "[CV] C=100, gamma=0.1 ................................................\n", + "[CV] ................................. C=100, gamma=0.1, total= 0.0s\n", + "[CV] C=100, gamma=0.1 ................................................\n", + "[CV] ................................. C=100, gamma=0.1, total= 0.0s\n", + "[CV] C=100, gamma=0.01 ...............................................\n", + "[CV] ................................ C=100, gamma=0.01, total= 0.0s\n", + "[CV] C=100, gamma=0.01 ...............................................\n", + "[CV] ................................ C=100, gamma=0.01, total= 0.0s\n", + "[CV] C=100, gamma=0.01 ...............................................\n", + "[CV] ................................ C=100, gamma=0.01, total= 0.0s\n", + "[CV] C=100, gamma=0.01 ...............................................\n", + "[CV] ................................ C=100, gamma=0.01, total= 0.0s\n", + "[CV] C=100, gamma=0.01 ...............................................\n", + "[CV] ................................ C=100, gamma=0.01, total= 0.0s\n", + "[CV] C=100, gamma=0.001 ..............................................\n", + "[CV] ............................... C=100, gamma=0.001, total= 0.0s\n", + "[CV] C=100, gamma=0.001 ..............................................\n", + "[CV] ............................... C=100, gamma=0.001, total= 0.0s\n", + "[CV] C=100, gamma=0.001 ..............................................\n", + "[CV] ............................... C=100, gamma=0.001, total= 0.0s\n", + "[CV] C=100, gamma=0.001 ..............................................\n", + "[CV] ............................... C=100, gamma=0.001, total= 0.0s\n", + "[CV] C=100, gamma=0.001 ..............................................\n", + "[CV] ............................... C=100, gamma=0.001, total= 0.0s\n", + "[[ 8 0]\n", + " [ 3 28]]\n", + " precision recall f1-score support\n", + "\n", + " 0 0.73 1.00 0.84 8\n", + " 1 1.00 0.90 0.95 31\n", + "\n", + " accuracy 0.92 39\n", + " macro avg 0.86 0.95 0.90 39\n", + "weighted avg 0.94 0.92 0.93 39\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "[Parallel(n_jobs=1)]: Done 80 out of 80 | elapsed: 0.3s finished\n" + ] + } + ], + "source": [ + "param_grid={\"C\": [0.1,1,10,100], \"gamma\": [1, 0.1, 0.01, 0.001]}\n", + "grid = GridSearchCV(svm.SVC(), param_grid, verbose=2)\n", + "grid.fit(X_train, Y_train)\n", + "grid.best_params_\n", + "grid.best_estimator_\n", + "grid_predictions = grid.predict(X_test)\n", + "print(confusion_matrix(Y_test, grid_predictions))\n", + "print(classification_report(Y_test, grid_predictions))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Building a Predictive System" + ] + }, + { + "cell_type": "code", + "execution_count": 73, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[1]\n", + "The Person is suffering from Parkinsons Disease\n" + ] + } + ], + "source": [ + "#Defining a tuple input_data to check and determine whether a person is Parkinson positive or negative according to our model\n", + "\n", + "#Input_data does not contains the status value from dataset, because we need to determine it using our model\n", + "input_data = (95.05600,120.10300,91.22600,0.00532,0.00006,0.00268,0.00332,0.00803,0.02838,0.25500,0.01441,0.01725,0.02444,0.04324,0.01022,21.86200,0.547037,0.798463,-5.011879,0.325996,2.432792,0.271362)\n", + "#input_data =(237.22600,247.32600,225.22700,0.00298,0.00001,0.00169,0.00182,0.00507,0.01752,0.16400,0.01035,0.01024,0.01133,0.03104,0.00740,22.73600,0.305062,0.654172,-7.310550,0.098648,2.416838,0.095032)\n", + "#(Here in first case the person is suffering from Parkinson as per our dataset and for 2nd case it is not PD positive, Similarly we can check for other input data too. Our model is also predicting same thing that means it is working fine)\n", + "\n", + "#Changing input data into numpy array for further processing using np.asarray() function\n", + "input_data_as_numpy_array = np.asarray(input_data)\n", + "\n", + "#Reshape the numpy array to tell our model that we are just predicting for one data point\n", + "input_data_reshaped = input_data_as_numpy_array.reshape(1,-1)\n", + "\n", + "#Standardize the data\n", + "std_data = scaler.transform(input_data_reshaped)\n", + "\n", + "prediction = model.predict(std_data)\n", + "print(prediction)\n", + "\n", + "if (prediction[0]==0):\n", + " print(\"The Person does not have Parkinsons Disease\")\n", + "else:\n", + " print(\"The Person is suffering from Parkinsons Disease\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# *******THANK YOU*******" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.5" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}