Skip to content

Created an unsupervised machine-learning model that predicts how cryptocurrencies are affected by 24-hour or 7-day price changes

Notifications You must be signed in to change notification settings

glongo001/CryptoClustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CryptoClustering

I used my knowledge of unsupervised learning to predict if cryptocurrencies are affected by 24-hour or 7-day price changes.

Prepare the Data

  1. I used pandas and hvplot to work with the crypto_market_data.csv file and visualized the results. I used Scikit-learn (sklearn) for unsupervised machine learning, I specifically used KMeans, PCA and StandardScaler.

  2. I created a dataframe with the information from the csv file and used describe() to create a summary statistics table.

alt text

  1. I created a plot to visualize the data I have.

alt text

  1. I used StandardScaler to normalize the data from the csv file. Then, I created a dataframe with the coin name as the index.

alt text

Find the Best Value for k Using the Original Scaled DataFrame

  1. I used the elbow method to find the best value for k. I used KMeans in a for loop to compute every possible value for k.

  2. I created a dataframe with the obtained k and inertia values and created an elbow plot to determine the best possible value for k.

    • The best possible value for k was 3. After this point the plot seems to level off significantly.

alt text

Cluster Cryptocurrencies with K-means Using the Original Scaled Data

  1. I initialized the KMeans module with 3 as the best value for k, I fit the model using the scaled data I obtained with StandardScaler, and predicted the market clusters.

alt text

  1. I created a scatter plot with the price_change_percentage_24h as the x values and price_change_percentage_7d as the y valued. I colored the graph points by market cluster and used hover_cols to display the coin name when hovering over each datapoint.

alt text

Optimize Clusters with Principal Component Analysis

  1. I created a PCA model that reduced the features to three principal components.

  2. I obtained the explained variance ratio. The array I obtained was array([0.47862164, 0.26608254, 0.1684978 ]). Therefore, the explained variance ratio for the three principal components is 91.3%.

  3. I created a new dataframe with the PCA data and the coin names in the index.

alt text

Find the Best Value for k Using the PCA Data

  1. I used the elbow method to find the best value for k. I used KMeans in a for loop to compute every possible value for k. This time I used the PCA data for the elbow method.

  2. I created a dataframe with the obtained k and inertia values and created an elbow plot to determine the best possible value for k with the PCA data.

    • The best possible value for k was 4. After this point the plot flattens significantly.

alt text

Cluster Cryptocurrencies with K-means Using the PCA Data

  1. I initialized the KMeans module with 4 as the best value for k, I fit the model using the PCA data, and predicted the market clusters. I created a dataframe with the predicted PCA data.

alt text

  1. I created a scatter plot with PC1 as the x values and PC2 as the y valued. I colored the graph points by market cluster and used hover_cols to display the coin name when hovering over each datapoint.

alt text

Visualize and Compare the Results

  1. I created a composite plot of the original data elbow plot and the PCA data elbow plot.

  2. I also created a composite plot of the original data market clusters and the PCA data market clusters.

About

Created an unsupervised machine-learning model that predicts how cryptocurrencies are affected by 24-hour or 7-day price changes

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published