I used my knowledge of unsupervised learning to predict if cryptocurrencies are affected by 24-hour or 7-day price changes.
-
I used pandas and hvplot to work with the
crypto_market_data.csv
file and visualized the results. I used Scikit-learn (sklearn) for unsupervised machine learning, I specifically used KMeans, PCA and StandardScaler. -
I created a dataframe with the information from the csv file and used
describe()
to create a summary statistics table.
- I created a plot to visualize the data I have.
- I used StandardScaler to normalize the data from the csv file. Then, I created a dataframe with the coin name as the index.
-
I used the elbow method to find the best value for k. I used KMeans in a for loop to compute every possible value for k.
-
I created a dataframe with the obtained k and inertia values and created an elbow plot to determine the best possible value for k.
- The best possible value for k was 3. After this point the plot seems to level off significantly.
- I initialized the KMeans module with 3 as the best value for k, I fit the model using the scaled data I obtained with StandardScaler, and predicted the market clusters.
- I created a scatter plot with the price_change_percentage_24h as the x values and price_change_percentage_7d as the y valued. I colored the graph points by market cluster and used hover_cols to display the coin name when hovering over each datapoint.
-
I created a PCA model that reduced the features to three principal components.
-
I obtained the explained variance ratio. The array I obtained was array([0.47862164, 0.26608254, 0.1684978 ]). Therefore, the explained variance ratio for the three principal components is 91.3%.
-
I created a new dataframe with the PCA data and the coin names in the index.
-
I used the elbow method to find the best value for k. I used KMeans in a for loop to compute every possible value for k. This time I used the PCA data for the elbow method.
-
I created a dataframe with the obtained k and inertia values and created an elbow plot to determine the best possible value for k with the PCA data.
- The best possible value for k was 4. After this point the plot flattens significantly.
- I initialized the KMeans module with 4 as the best value for k, I fit the model using the PCA data, and predicted the market clusters. I created a dataframe with the predicted PCA data.
- I created a scatter plot with PC1 as the x values and PC2 as the y valued. I colored the graph points by market cluster and used hover_cols to display the coin name when hovering over each datapoint.
-
I created a composite plot of the original data elbow plot and the PCA data elbow plot.
-
I also created a composite plot of the original data market clusters and the PCA data market clusters.